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HUMAN GENES AND GENE EXPRESSION PRODUCTS 

Cross-Reference to Related Applications 

This application claims the benefit of U.S. provisional application serial no. 60/188,609, filed 
5 March 9, 2000, which application is incorporated herein by reference in its entirety. 
Field of the Invention 

The present invention relates to polynucleotides of human origin and the encoded gene products. 
Background of the Invention 

Identification of novel polynucleotides, particularly those that encode an expressed gene 

1 0 product, is important in the advancement of drug discovery, diagnostic technologies, and the 

understanding of the progression and nature of complex diseases such as cancer. Identification of genes 
expressed in different cell types isolated from sources that differ in disease state or stage, developmental 
stage, exposure to various environmental factors, the tissue of origin, the species from which the tissue 
was isolated, and the like is key to identifying the genetic factors that are responsible for the phenotypes 

1 5 associated with these various differences. 

This invention provides novel human polynucleotides, the polypeptides encoded by these 
polynucleotides, and the genes and proteins corresponding to these novel polynucleotides. 
Summary of the Invention 

This invention relates to novel human polynucleotides and variants thereof, their encoded 

20 polypeptides and variants thereof, to genes corresponding to these polynucleotides and to proteins 

expressed by the genes. The invention also relates to diagnostics and therapeutics comprising such novel 
human polynucleotides, their corresponding genes or gene products, including probes, antisense 
nucleotides, and antibodies. The polynucleotides of the invention correspond to a polynucleotide 
comprising the sequence information of at least one of SEQ ID NOS: 1-2396. 

25 Various aspects and embodiments of the invention will be readily apparent to the ordinarily 

skilled artisan upon reading the description provided herein. 
Detailed Description of the Invention 

The invention relates to polynucleotides comprising the disclosed nucleotide sequences, to full 
length cDNA, mRNA genomic sequences, and genes corresponding to these sequences and degenerate 

30 variants thereof, and to polypeptides encoded by the polynucleotides of the invention and polypeptide 
variants. The following detailed description describes the polynucleotide compositions encompassed by 
the invention, methods for obtaining cDNA or genomic DNA encoding a full-length gene product, 
expression of these polynucleotides and genes, identification of structural motifs of the polynucleotides 
and genes, identification of the function of a gene product encoded by a gene corresponding to a 

1 
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polynucleotide of the invention, use of the provided polynucleotides as probes and in mapping and in 
tissue profiling, use of the corresponding polypeptides and other gene products to raise antibodies, and 
use of the polynucleotides and their encoded gene products for therapeutic and diagnostic purposes. 
Polynucleotide Compositions 

The scope of the invention with respect to polynucleotide compositions includes, but is not 
necessarily limited to, polynucleotides having a sequence set forth in any one of SEQ ID NOS: 1-2396; 
polynucleotides obtained from the biological materials described herein or other biological sources 
(particularly human sources) by hybridization under stringent conditions (particularly conditions of high 
stringency); genes corresponding to the provided polynucleotides; variants of the provided 
polynucleotides and their corresponding genes, particularly those variants that retain a biological activity 
of the encoded gene product (e.g., a biological activity ascribed to a gene product corresponding to the 
provided polynucleotides as a result of the assignment of the gene product to a protein family(ies) and/or 
identification of a functional domain present in the gene product). Other nucleic acid compositions 
contemplated by and within the scope of the present invention will be readily apparent to one of ordinaiy 
skill in the art when provided with the disclosure here. "Polynucleotide" and "nucleic acid" as used 
herein with reference to nucleic acids of the composition is not intended to be limiting as to the length or 
structure of the nucleic acid unless specifically indicted. 

The invention features polynucleotides that are expressed in human tissue, specifically human 
colon, breast, and/or lung tissue. Novel nucleic acid compositions of the invention of particular interest 
comprise a sequence set forth in any one of SEQ ID NOS: 1-2396 or an identifying sequence thereof. An 
"identifying sequence" is a contiguous sequence of residues at least about 10 nt to about 20 nt in length, 
usually at least about 50 nt to about 100 nt in length, that uniquely identifies a polynucleotide sequence, 
e.g., exhibits less than 90%, usually less than about 80% to about 85% sequence identity to any 
contiguous nucleotide sequence of more than about 20 nt. Thus, the subject novel nucleic acid 
compositions include full length cDNAs or mRNAs that encompass an identifying sequence of 
contiguous nucleotides from any one of SEQ ID NOS: 1-2396. 

The polynucleotides of the invention also include polynucleotides having sequence similarity or 
sequence identity. Nucleic acids having sequence similarity are detected by hybridization under low 
stringency conditions, for example, at 50°C and 10XSSC (0.9 M saline/0.09 M sodium citrate) and 
remain bound when subjected to washing at 55°C in 1XSSC. Sequence identity can be determined by 
hybridization under stringent conditions, for example, at 50°C or higher and 0. 1XSSC (9 mM saline/0.9 
mM sodium citrate). Hybridization methods and conditions are well known in the art, see, e.g., USPN 
5,707,829. Nucleic acids that are substantially identical to the provided polynucleotide sequences, e.g. 
allelic variants, genetically altered versions of the gene, etc., bind to the provided polynucleotide 
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sequences ( SEQ ID NOS: 1-2396) under stringent hybridization conditions. By using probes, 
particularly labeled probes of DNA sequences, one can isolate homologous or related genes. The source 
of homologous genes can be any species, e.g. primate species, particularly human; rodents, such as rats 
and mice; canines, felines, bovines, ovines, equines, yeast, nematodes, etc. 
5 Preferably, hybridization is performed using at least 15 contiguous nucleotides (nt) of at least 

one of SEQ ID NOS: 1-2396. That is, when at least 15 contiguous nt of one of the disclosed SEQ ID 
NOS. is used as a probe, the probe will preferentially hybridize with a nucleic acid comprising the 
complementary sequence, allowing the identification and retrieval of the nucleic acids that uniquely 
hybridize to the selected probe. Probes from more than one SEQ ID NO. can hybridize with the same 

10 nucleic acid if the cDNA from which they were derived corresponds to one mRNA. Probes of more than 
1 5 nt can be used, e.g., probes of from about 1 8 nt to about 1 00 nt, but 1 5 nt represents sufficient 
sequence for unique identification. 

The polynucleotides of the invention also include naturally occurring variants of the nucleotide 
sequences (e.g., degenerate variants, allelic variants, etc.). Variants of the polynucleotides of the 

1 5 invention are identified by hybridization of putative variants with nucleotide sequences disclosed herein, 
preferably by hybridization under stringent conditions. For example, by using appropriate wash 
conditions, variants of the polynucleotides of the invention can be identified where the allelic variant 
exhibits at most about 25 -3 0% base pair (bp) mismatches relative to the selected polynucleotide probe. - 
In general, allelic variants contain 15-25% bp mismatches, and can contain as little as even 5-15%, or 2- 

20 5%, or 1-2% bp mismatches, as well as a single bp mismatch. 

The invention also encompasses homologs corresponding to the polynucleotides of SEQ ID 
NOS: 1-2396, where the source of homologous genes can be any mammalian species, e.g., primate 
species, particularly human; rodents, such as rats; canines, felines, bovines, ovines, equines, yeast, 
nematodes, etc. Between mammalian species, e.g., human and mouse, homologs generally have 

25 substantial sequence similarity, e.g., at least 75% sequence identity, usually at least 90%, more usually at 
least 95% between nucleotide sequences. Sequence similarity is calculated based on a reference 
sequence, which may be a subset of a larger sequence, such as a conserved motif, coding region, flanking 
region, etc. A reference sequence will usually be at least about 18 contiguous nt long, more usually at 
least about 30 nt long, and may extend to the complete sequence that is being compared. Algorithms for 

30 sequence analysis are known in the art, such as gapped BLAST, described in Altschul, et al. Nucleic 
Acids Res. (1997)25:3389-3402. 

In general, variants of the invention have a sequence identity greater than at least about 65%, 
preferably at least about 75%, more preferably at least about 85%, and can be greater than at least about 
90% or more as determined by the Smith- Waterman homology search algorithm as implemented in 

3 
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MPSRCH program (Oxford Molecular). For the purposes of this invention, a preferred method of 
calculating percent identity is the Smith-Waterman algorithm, using the following. Global DNA 
sequence identity must be greater than 65% as determined by the Smith-Waterman homology search 
algorithm as implemented in MPSRCH program (Oxford Molecular) using an affine gap search with the 
following search parameters: gap open penalty, 12; and gap extension penalty, 1. 

The subject nucleic acids can be cDNAs or genomic DNAs, as well as fragments thereof, 
particularly fragments that encode a biologically active gene product and/or are useful in the methods 
disclosed herein (e.g., in diagnosis, as a unique identifier of a differentially expressed gene of interest, 
etc). The term "cDNA" as used herein is intended to include all nucleic acids that share the arrangement 
of sequence elements found in native mature mRNA species, where sequence elements are exons and 3 5 
and 5 5 non-coding regions. Normally mRNA species have contiguous exons, with the intervening 
introns, when present, being removed by nuclear RNA splicing, to create a continuous open reading 
frame encoding a polypeptide of the invention. 

A genomic sequence of interest comprises the nucleic acid present between the initiation codon 
and the stop codon, as defined in the listed sequences, including all of the introns that are normally 
present in a native chromosome. It can further include the 3 ' and 5 9 untranslated regions found in the 
mature mRNA. It can further include specific transcriptional and translation^ regulatory sequences, 
such as promoters, enhancers, etc., including about 1 kb, but possibly more, of flanking genomic DNA at 
either the 5 ' and 3 ' end of the transcribed region. The genomic DNA can be isolated as a fragment of 
100 kbp or smaller; and substantially free of flanking chromosomal sequence. The genomic DNA 
flanking the coding region, either 3' and 5', or internal regulatoiy sequences as sometimes found in 
introns, contains sequences required for proper tissue, stage-specific, or disease-state specific 
expression. 

The nucleic acid compositions of the subject invention can encode all or a part of the subject 
polypeptides. Double or single stranded fragments can be obtained from the DNA sequence by 
chemically synthesizing oligonucleotides in accordance with conventional methods, by restriction 
enzyme digestion, by PCR amplification, etc. Isolated polynucleotides and polynucleotide fragments of 
the invention comprise at least about 10, about 15, about 20, about 35, about 50, about 100, about 150 
to about 200, about 250 to about 300, or about 350 contiguous nt selected from the polynucleotide 
sequences as shown in SEQ ID NOS:l-2396. For the most part, fragments will be of at least 15 nt, 
usually at least 18 nt or 25 nt, and up to at least about 50 contiguous nt in length or more. In a preferred 
embodiment, the polynucleotide molecules comprise a contiguous sequence of at least 12 nt selected 
from the group consisting of the polynucleotides shown in SEQ ID NOS: 1-2396. 
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Probes specific to the polynucleotides of the invention can be generated using the polynucleotide 
sequences disclosed in SEQIDNOS: 1-2396. The probes are preferably at least about a 12, 15, 16, 18, 
20, 22, 24, or 25 nt fragment of a corresponding contiguous sequence of SEQ ID NOS: 1-2396, and can 
be less than 2, 1, 0.5, 0. 1, or 0.05 kb in length. The probes can be synthesized chemically or can be 
5 generated from longer polynucleotides using restriction enzymes. The probes can be labeled, for 

example, with a radioactive, biotinylated, or fluorescent tag. Preferably, probes are designed based upon 
an identifying sequence of a polynucleotide of one of SEQ ID NOS: 1-2396. More preferably, probes are 
designed based on a contiguous sequence of one of the subject polynucleotides that remain unmasked 
following application of a masking program for masking low complexity (e.g. , XBLAST) to the 

10 sequence., i.e., one would select an unmasked region, as indicated by the polynucleotides outside the 
poly-n stretches of the masked sequence produced by the masking program. 

The polynucleotides of the subject invention are isolated and obtained in substantial purity, 
generally as other than an intact chromosome. Usually, the polynucleotides, either as DNA or RNA, will 
be obtained substantially free of other naturally-occurring nucleic acid sequences, generally being at least 

15 about 50%, usually at least about 90% pure and are typically "recombinant", e.g., flanked by one or 
more nucleotides with which it is not nonnally associated on a naturally occurring chromosome. 

The polynucleotides of the invention can be provided as a linear molecule or within a circular 
molecule, and can be provided within autonomously replicating molecules (vectors) or within molecules 
without replication sequences. Expression of the polynucleotides can be regulated by their own or by 

20 other regulatoiy sequences known in the art. The polynucleotides of the invention can be introduced into 
suitable host cells using a variety of techniques available in the art, such as transferrin polycation- 
mediated DNA transfer, transfection with naked or encapsulated nucleic acids, liposome-mediated DNA 
transfer, intracellular transportation of DNA-coated latex beads, protoplast fusion, viral infection, 
electroporation, gene gun, calcium phosphate-mediated transfection, and the like. 

25 The subject nucleic acid compositions can be used to, for example, produce polypeptides, as 

probes for the detection of mRNA of the invention in biological samples (e.g., extracts of human cells) 
to generate additional copies of the polynucleotides, to generate ribozymes or antisense oligonucleotides, 
and as single stranded DNA probes or as triple-strand forming oligonucleotides. The probes described 
herein can be used to, for example, determine the presence or absence of the polynucleotide sequences as 

30 shown in SEQ ID NOS: 1-2396 or variants thereof in a sample. These and other uses are described in 
more detail below. 

Use of Polynucleotides to Obtain Full-Length cDNA, Gene, and Promoter Region 

Full-length cDNA molecules comprising the disclosed polynucleotides are obtained as follows. 
A polynucleotide having a sequence of one of SEQ ID NOS: 1-2396, or a portion thereof comprising at 
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least 12, 15, 18, or 20 nt, is used as a hybridization probe to detect hybridizing members of a cDNA 
library using probe design methods, cloning methods, and clone selection techniques such as those 
described in USPN 5,654,173. Libraries of cDNA are made from selected tissues, such as normal or 
tumor tissue, or from tissues of a mammal treated with, for example, a pharmaceutical agent. Preferably, 
the tissue is the same as the tissue from which the polynucleotides of the invention were isolated, as both 
the polynucleotides described herein and the cDNA represent expressed genes. Most preferably, the 
cDNA library is made from the biological material described herein in the Examples. The choice of cell 
type for library construction can be made after the identity of the protein encoded by the gene 
corresponding to the polynucleotide of the invention is known. This will indicate which tissue and cell 
types are likely to express the related gene, and thus represent a suitable source for the mKNA for 
generating the cDNA. Where the provided polynucleotides are isolated from cDNA libraries, the 
libraries are prepared from mRNA of human colon cells, more preferably, human colon cancer cells, 
even more preferably, from a highly metastatic colon cell, Kml2L4. 

Techniques for producing and probing nucleic acid sequence libraries are described, for 
example, in Sambrook et ai,Molecular Cloning: A Laboratory Manual, 2nd Ed, (1989) Cold Spring 
Harbor Press, Cold Spring Harbor, NY. The cDNA can be prepared by using primers based on sequence 
from SEQIDNOS:l-2396. In one embodiment, the cDNA library can be made from only poly- 
adenylated mRNA. Thus, poly-T primers can be used to prepare cDNA from the mRNA. 

Members of the library that are larger than the provided polynucleotides, and preferably that 
encompass the complete coding sequence of the native message, are obtained. In order to confirm that 
the entire cDNA has been obtained, RNA protection experiments are performed as follows. 
Hybridization of a full-length cDNA to an mRNA will protect the RNA from RNase degradation. If the 
cDNA is not full length, then the portions of the mRNA that are not hybridized will be subject to RNase 
degradatioa This is assayed, as is known in the art, by changes in electrophoretic mobility on 
polyacrylamide gels, or by detection of released monoribonucleotides. Sambrook et al, Molecular 
Cloning: A Laboratory Manual, 2nd Ed, (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY. 
In order to obtain additional sequences 5' to the end of a partial cDNA, 5' RACE (PCR Protocols: A 
Guide to Methods and Applications, (1990) Academic Press, Inc.) can be performed. 

Genomic DNA is isolated using the provided polynucleotides in a manner similar to the isolation 
of full-length cDNAs. Briefly, the provided polynucleotides, or portions thereof, are used as probes to 
libraries of genomic DNA. Preferably, the library is obtained from the cell type that was used to 
generate the polynucleotides of the invention, but this is not essential. Most preferably, the genomic 
DNA is obtained from the biological material described herein in the Examples. Such libraries can be in 
vectors suitable for carrying large segments of a genome, such as PI or YAC, as described in detail in 
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Sambrook et al % 9.4-9.30. In addition, genomic sequences can be isolated from human BAC libraries, 
which are commercially available from Research Genetics, Inc., Huntsville, Alabama, USA, for example. 
In order to obtain additional 5' or 3* sequences, chromosome walking is performed, as described in 
Sambrook et aL, such that adjacent and overlapping fragments of genomic DNA are isolated. These are 
5 mapped and pieced together, as is known in the art, using restriction digestion enzymes and DNA ligase. 
Using the polynucleotide sequences of the invention, corresponding full-length genes can be 
isolated using both classical and PCR methods to construct and probe cDNA libraries. Using either 
method, Northern blots, preferably, are performed on a number of cell types to determine which cell lines 
express the gene of interest at the highest level. Classical methods of constructing cDNA libraries are 
10 taught in Sambrook et aL, supra. With these methods, cDNA can be produced from mRNA and inserted 
into viral or expression vectors. Typically, libraries of mRNA comprising poly(A) tails can be produced 
with poly(T) primers. Similarly, cDNA libraries can be produced using the instant sequences as primers. 

PCR methods are used to amplify the members of a cDNA library that comprise the desired 

1 5 insert. In this case, the desired insert will contain sequence from the full length cDNA that corresponds 
to the instant polynucleotides. Such PCR methods include gene trapping and RACE methods. Gene 
trapping entails inserting a member of a cDNA library into a vector. The vector then is denatured to 
produce single stranded molecules. Next, a substrate-bound probe, such a biotinylated oligo, is used to 
trap cDNA inserts of interest. Biotinylated probes can be linked to an avidin-bound solid substrate. 

20 PCR methods can be used to amplify the trapped cDNA. To trap sequences corresponding to the frill 
length genes, the labeled probe sequence is based on the polynucleotide sequences of the invention. 
Random primers or primers specific to the library vector can be used to amplify the trapped cDNA. 
Such gene trapping techniques are described in Gruber et aL, WO 95/04745 and Gruber et aL, USPN 
5,500,356. Kits are commercially available to perform gene trapping experiments from, for example, 

25 Life Technologies, Gaithersburg, Maryland, USA. 

"Rapid amplification of cDNA ends " or RACE, is a PCR method of amplifying cDNAs from a 
number of different RNAs. The cDNAs are ligated to an oligonucleotide linker, and amplified by PCR 
using two primers. One primer is based on sequence from the instant polynucleotides, for which full 
length sequence is desired, and a second primer comprises sequence that hybridizes to the 

30 oligonucleotide linker to amplify the cDNA. A description of this methods is reported in WO 97/191 10. 
In preferred embodiments of RACE, a common primer is designed to anneal to an arbitrary adaptor 
sequence ligated to cDNA ends (Apte and Siebert, Biotechniques (1993) 75:890-893; Edwards etal, 
Nuc. Acids Res. (1991) 79:5227-5232). When a single gene-specific RACE primer is paired with the 
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common primer, preferential amplification of sequences between the single gene specific primer and the 
common primer occurs. Commercial cDNA pools modified for use in RACE are available. 

Another PCR-based method generates full-length cDNA library with anchored ends without 
needing specific knowledge of the cDNA sequence. The method uses lock-docking primers (I-VI), where 
5 one primer, poly TV (I-III) locks over the polyA tail of eukaiyotic mRNA producing first strand 
synthesis and a second primer, polyGH (TV-VI) locks onto the polyC tail added by terminal 
deoxynucleotidyl transferase (TdT)(see, e.g., WO 96/40998). 

The promoter region of a gene generally is located 5' to the initiation site for RNA polymerase 
II. Hundreds of promoter regions contain the "TATA" box, a sequence such as TATTA or TATAA, 

1 0 which is sensitive to mutations. The promoter region can be obtained by performing 5 9 RACE using a 
primer from the coding region of the gene. Alternatively, the cDNA can be used as a probe for the 
genomic sequence, and the region 5 9 to the coding region is identified by "walking up." If the gene is 
highly expressed or differentially expressed, the promoter from the gene can be of use in a regulatory 
construct for a heterologous gene. 

1 5 Once the full-length cDNA or gene is obtained, DNA encoding variants can be prepared by site- 

directed mutagenesis, described in detail in Sambrook et al, 15.3-15.63. The choice of codon or 
nucleotide to be replaced can be based on disclosure herein on optional changes in amino acids to 
achieve altered protein structure and/or function. 

As an alternative method to obtaining DNA or RNA from a biological material, nucleic acid 

20 comprising nucleotides having the sequence of one or more polynucleotides of the invention can be 
synthesized. Thus, the invention encompasses nucleic acid molecules ranging in length from 15 nt 
(corresponding to at least 15 contiguous nt of one of SEQ ID NOS: 1-2396) up to a maximum length 
suitable for one or more biological manipulations, including replication and expression, of the nucleic 
acid molecule. The invention includes but is not limited to (a) nucleic acid having the size of a full gene, 

25 and comprising at least one of SEQ ID NOS: 1-2396; (b) the nucleic acid of (a) also comprising at least 
one additional gene, operably linked to permit expression of a fusion protein; (c) an expression vector 
comprising (a) or (b); (d) a plasmid comprising (a) or (b) ; and (e) a recombinant viral particle 
comprising (a) or (b). Once provided with the polynucleotides disclosed herein, construction or 
preparation of (a) - (e) are well within the skill in the art. 

30 The sequence of a nucleic acid comprising at least 1 5 contiguous nt of at least any one of SEQ 

ID NOS: 1-2396, preferably the entire sequence of at least any one of SEQ ID NOS: 1-2396, is not 
limited and can be any sequence of A, T, G, and/or C (for DNA) and A, U, G, and/or C (for RNA) or 
modified bases thereof, including inosine and pseudouridine. The choice of sequence will depend on the 
desired function and can be dictated by coding regions desired, the intron-like regions desired, and the 
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regulatory regions desired Where the entire sequence of any one of SEQ ID NOS: 1-2396 is within the 
nucleic acid, the nucleic acid obtained is referred to herein as a polynucleotide comprising the sequence 
of any one of SEQ ID NOS: 1-2396. 

Expression of Polypeptide Encoded by Full-Length cDNA or Full-Length Gene 
5 The provided polynucleotides (e.g. , a polynucleotide having a sequence of one of SEQ ID 

NOS: 1-2396), the corresponding cDNA, or the full-length gene is used to express a partial or complete 
gene product. Constructs of polynucleotides having sequences of SEQ ID NOS: 1-2396 can also be 
generated synthetically. Alternatively, single-step assembly of a gene and entire plasmid from large 
numbers of ohgodeoxyribonucleotides is described by, e.g., Stemmer et al, Gene (Amsterdam) (1995) 

10 164(l) A9-53. In this method, assembly PCR (the synthesis of long DNA sequences from large numbers 
of ohgodeoxyribonucleotides (oligos)) is described. The method is derived from DNA shuffling 
(Stemmer, Nature (1994) 370:389-391), and does not rely on DNA ligase, but instead relies on DNA 
polymerase to build increasingly longer DNA fragments during the assembly process. 

Appropriate polynucleotide constructs are purified using standard recombinant DNA techniques 

15 as described in, for example, Sambrook etal, Molecular Cloning: A Laboratory Manual, 2nd Ed., 
(1989) Cold Spring Harbor Press, Cold Spring Harbor, NY, and under current regulations described in 
United States Dept. of HHS, National Institute of Health (NIH) Guidelines for Recombinant DNA 
Research. The gene product encoded by a polynucleotide of the invention is expressed in any expression 
system, including, for example, bacterial, yeast, insect, amphibian and mammalian systems. Vectors, 

20 host cells and methods for obtaining expression in same are well known in the art. Suitable vectors and 
host cells are described in USPN 5,654,173. 

Polynucleotide molecules comprising a polynucleotide sequence provided herein are generally 
propagated by placing the molecule in a vector. Viral and non-viral vectors are used, including plasmids. 
The choice of plasmid will depend on the type of cell in which propagation is desired and the purpose of 

25 propagation. Certain vectors are useful for amplifying and making large amounts of the desired DNA 
sequence. Other vectors are suitable for expression in cells in culture. Still other vectors are suitable for 
transfer and expression in cells in a whole animal or person. The choice of appropriate vector is well 
within the skill of the art. Many such vectors are available commercially. Methods for preparation of 
vectors comprising a desired sequence are well known in the art. 

30 The polynucleotides set forth in SEQ ID NOS: 1-2396 or their corresponding full-length 

polynucleotides are linked to regulatory sequences as appropriate to obtain the desired expression 
properties. These can include promoters (attached either at the 5' end of the sense strand or at the 3' end 
of the antisense strand), enhancers, terminators, operators, repressors, and inducers. The promoters can 
be regulated or constitutive. In some situations it may be desirable to use conditionally active promoters, 
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such as tissue-specific or developmental stage-specific promoters. These are linked to the desired 
nucleotide sequence using the techniques described above for linkage to vectors. Any techniques known 
in the art can be used. 

When any of the above host cells, or other appropriate host cells or organisms, are used to 
5 replicate and/or express the polynucleotides or nucleic acids of the invention, the resulting replicated 
nucleic acid, RNA, expressed protein or polypeptide, is within the scope of the invention as a product of 
the host cell or organism. The product is recovered by any appropriate means known in the art. 

Once the gene corresponding to a selected polynucleotide is identified, its expression can be 
regulated in the cell to which the gene is native. For example, an endogenous gene of a cell can be 
10 regulated by an exogenous regulatory sequence as disclosed in USPN 5,641,670. 

Identification of Functional and Structural Motifs of Novel Genes Screening Against Publicly Available 
Databases 

Translations of the nucleotide sequence of the provided polynucleotides, cDNAs or full genes 
15 can be aligned with individual known sequences. Similarity with individual sequences can be used to 
determine the activity of the polypeptides encoded by the polynucleotides of the invention. Also, 
sequences exhibiting similarity with more than one individual sequence can exhibit activities that are 
characteristic of either or both individual sequences. 

The full length sequences and fragments of the polynucleotide sequences of the nearest 
20 neighbors can be used as probes and primers to identify and isolate the full length sequence 

corresponding to provided polynucleotides. The nearest neighbors can indicate a tissue or cell type to be 
used to construct a library for the full-length sequences corresponding to the provided polynucleotides. 

Typically, a selected polynucleotide is translated in all six frames to determine the best 
alignment with the individual sequences. The sequences disclosed herein in the Sequence Listing are in a 
25 5' to y orientation and translation in three frames can be sufficient (with a few specific exceptions as 
described in the Examples). These amino acid sequences are referred to, generally, as query sequences, 
which will be aligned with the individual sequences. Databases with individual sequences are described 
in "Computer Methods for Macromolecular Sequence Analysis" Methods in Enzymology (1996) 266, 
Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA. 
30 Databases include GenBank, EMBL, and DNA Database of Japan (DDBJ). 

Query and individual sequences can be aligned using the methods and computer programs 
described above, and include BLAST 2.0 ((National Center for Biotechnology Information, 
Bethesda, Maryland). See also Altschul, et al. Nucleic Acids Res. (1997) 25:3389-3402. Another 
alignment algorithm is Fasta, available in the Genetics Computing Group (GCG) package, Madison, 

10 
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Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc. Other techniques for 
alignment are described in Doolitde, supra. Preferably, an alignment program that permits gaps in the 
sequence is utilized to align the sequences. The Smith-Waterman is one type of algorithm that permits 
gaps in sequence alignments. SeeMtfA. Mol Biol (1997) 70: 173-187. Also, the GAP program using 
5 the Needleman and Wunsch alignment method can be utilized to align sequences. An alternative search 
strategy uses MPSRCH software, which runs on a MASPAR computer. MPSRCH uses a Smith- 
Waterman algorithm to score sequences on a massively parallel computer. This approach improves 
ability to identify sequences that are distantly related matches, and is especially tolerant of small gaps 
and nucleotide sequence errors. Amino acid sequences encoded by the provided polynucleotides can be 

10 used to search both protein and DNA databases. Incorporated herein by reference are all sequences that 
have been made public as of the filing date of this application by any of the DNA or protein sequence 
databases, including the patent databases (e.g., GeneSeq). Also incorporated by reference are those 
sequences that have been submitted to these databases as of the filing date of the present application but 
not made public until after the filing date of the present application. 

1 5 Results of individual and query sequence alignments can be divided into three categories: high 

similarity, weak similarity, and no similarity. Individual alignment results ranging from high similarity 
to weak similarity provide a basis for determining polypeptide activity and/or structure. Parameters for 
categorizing individual results include: percentage of the alignment region length where the strongest 
alignment is found, percent sequence identity, and p value. The percentage of the alignment region 

20 length is calculated by counting the number of residues of the individual sequence found in the region of 
strongest alignment, e.g., contiguous region of the individual sequence that contains the greatest number 
of residues that are identical to the residues of the corresponding region of the aligned query sequence. 
This number is divided by the total residue length of the query sequence to calculate a percentage. For 
example, a query sequence of 20 amino acid residues might be aligned with a 20 amino acid region of an 

25 individual sequence. The individual sequence might be identical to amino acid residues 5, 9-15, and 17- 
1 9 of the query sequence. The region of strongest alignment is thus the region stretching from residue 9- 
19, an 1 1 amino acid stretch. The percentage of the alignment region length is: 1 1 (length of the region 
of strongest alignment) divided by (query sequence length) 20 or 55%. 

Percent sequence identity is calculated by counting the number of amino acid matches between 

30 the query and individual sequence and dividing total number of matches by the number of residues of the 
individual sequences found in the region of strongest alignment. Thus, the percent identity in the 
example above would be 10 matches divided by 11 amino acids, or approximately, 90.9% 

P value is the probability that the alignment was produced by chance. For a single alignment, 
the p value can be calculated according to Karlin et al.,Proc. Natl. Acad. Sci. (1990) 57:2264 and 
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Karlin et aL, Proc. Natl Acad. Set (1993) 90. The p value of multiple alignments using the same query 
sequence can be calculated using an heuristic approach described in Altschul et al,Nat. Genet (1994) 
6:119. Alignment programs such as BLAST program can calculate the p value. See also Altschul et al, 
Nucleic Acids Res. (1997) 25:3389-3402. 
5 Another factor to consider for determining identity or similarity is the location of the similarity 

or identity. Strong local alignment can indicate similarity even if the length of alignment is short. 
Sequence identity scattered throughout the length of the query sequence also can indicate a similarity 
between the query and profile sequences. The boundaries of the region where the sequences align can be 
determined according to Doolittle, supra, BLAST 2.0 (see, e.g., Altschul, et al. Nucleic Acids Res. 
10 (1997) 25:3389-3402) or FAST programs; or by determining the area where sequence identity is 
highest. 

High Similarity. In general, in alignment results considered to be of high similarity, the percent 
of the alignment region length is typically at least about 55% of total length query sequence; more 
typically, at least about 58%; even more typically; at least about 60% of the total residue length of the 

15 query sequence. Usually, percent length of the alignment region can be as much as about 62%; more 
usually, as much as about 64%; even more usually, as much as about 66%. Further, for high similarity, 
the region of alignment, typically, exhibits at least about 75% of sequence identity; more typically, at 
least about 78%; even more typically; at least about 80% sequence identity. Usually, percent sequence 
identity can be as much as about 82%; more usually, as much as about 84%; even more usually, as much 

20 as about 86%. 

The p value is used in conjunction with these methods. If high similarity is found, the query 
sequence is considered to have high similarity with a profile sequence when the p value is less than or 

-2 -3 
equal to about 10 ; more usually; less than or equal to about 10 ; even more usually; less than or equal 

-4 -5 
to about 10 . More typically, the p value is no more than about 10 ; more typically; no more than or 

25 equal to about 10 ^; even more typically; no more than or equal to about 10 ^ for the query sequence 

to be considered high similarity. 

Weak Similarity. In general, where alignment results considered to be of weak similarity, there 
is no minimum percent length of the alignment region nor minimum length of alignment. A better 
showing of weak similarity is considered when the region of alignment is, typically, at least about 15 
30 amino acid residues in length; more typically, at least about 20; even more typically; at least about 25 

amino acid residues in length. Usually, length of the alignment region can be as much as about 30 amino 
acid residues; more usually, as much as about 40; even more usually, as much as about 60 amino acid 
residues. Further, for weak similarity, the region of alignment, typically, exhibits at least about 35% of 

12 
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sequence identity; more typically, at least about 40%; even more typically; at least about 45% sequence 

identity. Usually, percent sequence identity can be as much as about 50%; more usually, as much as 

about 55%; even more usually, as much as about 60%. 

If low similarity is found, the query sequence is considered to have weak similarity with a profile 

_2 

5 sequence when the p value is usually less than or equal to about 10 ; more usually; less than or equal to 

-3 -4 
about 10 ; even more usually; less than or equal to about 10 . More typically, the p value is no more 

than about 10 5 ; more usually; no more than or equal to about 10 10 ; even more usually; no more than 

or equal to about 10 * 5 for the query sequence to be considered weak similarity. 

Similarity Determined by Sequence Identity Alone. Sequence identity alone can be used to 

10 determine similarity of a query sequence to an individual sequence and can indicate the activity of the 
sequence. Such an alignment, preferably, permits gaps to align sequences. Typically, the query 
sequence is related to the profile sequence if the sequence identity over the entire query sequence is at 
least about 15%; more typically, at least about 20%; even more typically, at least about 25%; even more 
typically, at least about 50%. Sequence identity alone as a measure of similarity is most useful when the 

15 query sequence is usually, at least 80 residues in length; more usually, 90 residues; even more usually, at 
least 95 amino acid residues in length. More typically, similarity can be concluded based on sequence 
identity alone when the query sequence is preferably 1 00 residues in length; more preferably, 120 
residues in length; even more preferably, 150 amino acid residues in length. 

Alignments with Profile and Multiple Aligned Sequences. Translations of the provided 

20 polynucleotides can be aligned with amino acid profiles that define either protein families or common 
motifs. Also, translations of the provided polynucleotides can be aligned to multiple sequence 
alignments (MSA) comprising the polypeptide sequences of members of protein families or motifs. 
Similarity or identity with profile sequences or MSAs can be used to determine the activity of the gene 
products (e.g., polypeptides) encoded by the provided polynucleotides or corresponding cDNA or genes. 

25 For example, sequences that show an identity or similarity with a chemokine profile or MSA can exhibit 
chemokine activities. 

Profiles can designed manually by (1) creating an MSA, which is an alignment of the amino acid 
sequence of members that belong to the family and (2) constructing a statistical representation of the 
alignment. Such methods are described, for example, in BirncyetaL,Nucl. AcidRes. (1996) 24(14): 
30 273 0-2739. MSAs of some protein families and motifs are publicly available. For example, the Pfem 
database available from Washington University (St. Louis, Missouri) includes MSAs of 547 
different families and motifs. These MSAs are described also in Sonnhammer et aL, Proteins (1997) 

13 
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28: 405-420. Other publicaly available sources include those over the world wide web provided by the 
European Molecular Biology Laboratory (Heidelberg, Germany). A brief description of these MS As is 
reported in Pascarella et al, Trot Eng. (1996) 9(3) 249-251. Techniques for building profiles from 
MSAs are described in Sonnhammer et al, supra; Birney et al, supra; and "Computer Methods for 
5 Macromolecular Sequence Analysis," Methods in Enzymology (1996) 266, Doolittle, Academic Press, 
Inc., San Diego, California, USA. 

Similarity between a query sequence and a protein family or motif can be determined by (a) 
comparing the query sequence against the profile and/or (b) aligning the query sequence with the 
members of the family or motif. Typically, a program such as Searchwise is used to compare the query 

1 0 sequence to the statistical representation of the multiple alignment, also known as a profile (see Birney et 
al., supra). Other techniques to compare the sequence and profile are described in Sonnhammer et al., 
supra and Doolittle, supra. 

Next, methods described by Feng et al, J. Mol Evol. (1987) 25:35 1 and Higgins et al, 
CABIOS (1989) 5: 15 1 can be used align the query sequence with the members of a family or motif, also 

1 5 known as a MS A . Sequence alignments can be generated using any of a variety of software tools. 

Examples include PileUp, which creates a multiple sequence alignment, and is described in Feng et al, J. 
Mol. Evol. (1987) 25:35 1. Another method, GAP, uses the alignment method of Needleman et al, J. 
Mol Biol. (1970) 45:443. GAP is best suited for global alignment of sequences. A third method, 
BestFit, functions by inserting gaps to maximize the number of matches using the local homology 

20 algorithm of Smith et al, Adv. Appl Math. (198 1) 2:482. In general, the following factors are used to 
determine if a similarity between a query sequence and a profile or MSA exists: ( 1) number of 
conserved residues found in the query sequence, (2) percentage of conserved residues found in the query 
sequence, (3) number of frameshifts, and (4) spacing between conserved residues. 

Some alignment programs that both translate and align sequences can make any number of 

25 frameshifts when translating the nucleotide sequence to produce the best alignment. The fewer 

frameshifts needed to produce an alignment, the stronger the similarity or identity between the query and 
profile or MSAs. For example, a weak similarity resulting from no frameshifts can be a better indication 
of activity or structure of a query sequence, than a strong similarity resulting from two frameshifts. 
Preferably, three or fewer frameshifts are found in an ahgnment; more preferably two or fewer 

30 frameshifts; even more preferably, one or fewer frameshifts; even more preferably, no frameshifts are 
found in an alignment of query and profile or MSAs. 

Conserved residues are those amino acids found at a particular position in all or some of the 
family or motif members. Alternatively, a position is considered conserved if only a certain class of 
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amino acids is found in a particular position in all or some of the family members. For example, the N- 
terminal position can contain a positively charged amino acid, such as lysine, arginine, or histidine. 

Typically, a residue of a polypeptide is conserved when a class of amino acids or a single amino 
acid is found at a particular position in at least about 40% of all class members; more typically, at least 
5 about 50%; even more typically, at least about 60% of the members. Usually, a residue is conserved 
when a class or single amino acid is found in at least about 70% of the members of a family or motif; 
more usually, at least about 80%; even more usually, at least about 90%; even more usually, at least 
about 95%. 

A residue is considered conserved when three unrelated amino acids are found at a particular 
10 position in the some or all of the members; more usually, two unrelated amino acids. These residues are 
conserved when the unrelated amino acids are found at particular positions in at least about 40% of all 
class member; more typically, at least about 50%; even more typically, at least about 60% of the 
members. Usually, a residue is conserved when a class or single amino acid is found in at least about 
70% of the members of a family or motif; more usually, at least about 80%; even more usually, at least 
1 5 about 90%; even more usually, at least about 95%. 

A query sequence has similarity to a profile or MSA when the query sequence comprises at least 
about 25% of the conserved residues of the profile or MSA; more usually, at least about 30%; even more 
usually; at least about 40%. Typically, the query sequence has a stronger similarity to a profile sequence 
or MSA when the query sequence comprises at least about 45% of the conserved residues of the profile 
20 or MSA; more typically, at least about 50%; even more typically; at least about 55%. 
Identification of Secreted & Membrane-Bound Polypeptides 

Both secreted and membrane-bound polypeptides of the present invention are of particular 
interest. For example, levels of secreted polypeptides can be assayed in body fluids that are convenient, 
such as blood, plasma, serum, and other body fluids such as urine, prostatic fluid and semen. 
25 Membrane-bound polypeptides are useful for constructing vaccine antigens or inducing an immune 
response. Such antigens would comprise all or part of the extracellular region of the membrane-bound 
polypeptides. Because both secreted and membrane-bound polypeptides comprise a fragment of 
contiguous hydrophobic amino acids, hydrophobicity predicting algorithms can be used to identify such 
polypeptides. 

30 A signal sequence is usually encoded by both secreted and membrane-bound polypeptide genes 

to direct a polypeptide to the surface of the cell. The signal sequence usually comprises a stretch of 
hydrophobic residues. Such signal sequences can fold into helical structures. Membrane-bound 
polypeptides typically comprise at least one transmembrane region that possesses a stretch of 
hydrophobic amino acids that can transverse the membrane. Some transmembrane regions also exhibit a 

15 
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helical structure. Hydrophobic fragments within a polypeptide can be identified by using computer 
algorithms. Such algorithms include Hopp & Woods, Proc. Natl. Acad. Sci. USA (1981) 75:3824-3828; 
Kyte & Doolittle, J. Mol Biol. (1982) 157: 105-132; and RAOAR algorithm, Degli Esposti et al., Eur. 
J. Biochem. (1990) 190: 207-219. 
5 Another method of identifying secreted and membrane-bound polypeptides is to translate the 

polynucleotides of the invention in all six frames and determine if at least 8 contiguous hydrophobic 
amino acids are present. Those translated polypeptides with at least 8; more typically, 10; even more 
typically, 12 contiguous hydrophobic amino acids are considered to be either a putative secreted or 
membrane bound polypeptide. Hydrophobic amino acids include alanine, glycine, histidine, isoleucine, 

1 0 leucine, lysine, methionine, phenylalanine, proline, threonine, tryptophan, tyrosine, and valine 
Identification of the Function of an Expression Product of a Full-Length Gene 
Ribozymes, antisense constructs, and dominant negative mutants can be used to determine 
function of the expression product of a gene corresponding to a polynucleotide provided herein. These 
methods and compositions are particularly useful where the provided novel polynucleotide exhibits no 

1 5 significant or substantial homology to a sequence encoding a gene of known fonctioa Antisense 
molecules and ribozymes can be constructed from synthetic polynucleotides. Typically, the 
phosphoramidite method of oligonucleotide synthesis is used. See Beaucage et al, Tet. Lett. ( 1 98 1) 
22: 1859 and USPN 4,668,777. Automated devices for synthesis are available to create oligonucleotides 
using this chemistry. Examples of such devices include Biosearch 8600, Models 392 and 394 by 

20 Applied Biosystems, a division of Peikin-Elmer Corp., Foster City, California, USA; and Expedite by 
Perceptive Biosystems, Framingham, Massachusetts, USA. Synthetic RNA, phosphate analog 
oligonucleotides, and chemically derivatized oligonucleotides can also be produced, and can be 
covalently attached to other molecules. RNA oligonucleotides can be synthesized, for example, using 
RNA phosphoramidites. This method can be performed on an automated synthesizer, such as Applied 

25 Biosystems, Models 392 and 394, Foster City, California, USA. 

Phosphorothioate oligonucleotides can also be synthesized for antisense construction. A 
sulfurizing reagent, such as tetraethylthiruam disulfide (TETD) in acetonitrile can be used to convert the 
internucleotide cyanoethyl phosphite to the phosphorothioate triester within 15 minutes at room 
temperature. TETD replaces the iodine reagent, while all other reagents used for standard 

30 phosphoramidite chemistry remain the same. Such a synthesis method can be automated using Models 
392 and 394 by Applied Biosystems, for example. 

Oligonucleotides of up to 200 nt can be synthesized, more typically, 100 nt, more typically 50 
nt; even more typically 30 to 40 nt. These synthetic fragments can be annealed and ligated together to 
construct larger fragments. See, for example, Sambrook et al., supra. Trans-cleaving catalytic RNAs 
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(ribozymes) are RNA molecules possessing endoribonuclease activity. Ribozymes are specifically 
designed for a particular target, and the target message must contain a specific nucleotide sequence. 
They are engineered to cleave any RNA species site-specifically in the background of cellular RNA. The 
cleavage event renders the mRNA unstable and prevents protein expression. Importantly, ribozymes can 
5 be used to inhibit expression of a gene of unknown function for the purpose of determining its function 
in an in vitro or in vivo context, by detecting the phenotypic effect. One commonly used ribozyme motif 
is the hammerhead, for which the substrate sequence requirements are minimal. Design of the 
hammerhead ribozyme, as well as therapeutic uses of ribozymes, are disclosed in Usman et ah, Current 
Opin. Struct. Biol. (1996) 6:527. Methods for production of ribozymes, including hairpin structure 

10 ribozyme fragments, methods of increasing ribozyme specificity, and the like are known in the art. 

The hybridizing region of the ribozyme can be modified or can be prepared as a branched 
structure as described in Horn and Urdea, Nucleic Acids Res. (1989) 1 7:6959. The basic structure of the 
ribozymes can also be chemically altered in ways familiar to those skilled in the art, and chemically 
synthesized ribozymes can be administered as synthetic oligonucleotide derivatives modified by 

15 monomeric units. In a therapeutic context, liposome mediated delivery of ribozymes improves cellular 
uptake, as described in Birikh et al t Eur. J. Biochem. (1997) 245: 1. 

Antisense nucleic acids are designed to specifically bind to RNA, resulting in the formation of 
RNA-DNA or RNA-RNA hybrids, with an arrest of DNA replication, reverse transcription or messenger 
RNA translation. Antisense polynucleotides based on a selected polynucleotide sequence can interfere 

20 with expression of the corresponding gene. Antisense polynucleotides are typically generated within the 
cell by expression from antisense constructs that contain the antisense strand as the transcribed strand. 
Antisense polynucleotides based on the disclosed polynucleotides will bind and/or interfere with the 
translation of mRNA comprising a sequence complementary to the antisense polynucleotide. The 
expression products of control cells and cells treated with the antisense construct are compared to detect 

25 the protein product of the gene corresponding to the polynucleotide upon which the antisense construct is 
based. The protein is isolated and identified using routine biochemical methods. 

Given the extensive background literature and clinical experience in antisense therapy, one 
skilled in the art can use selected polynucleotides of the invention as additional potential therapeutics. 
The choice of polynucleotide can be narrowed by first testing them for binding to "hot spot" regions of 

30 the genome of cancerous cells. If a polynucleotide is identified as binding to a "hot spot", testing the 
polynucleotide as an antisense compound in the corresponding cancer cells is warranted. 

As an alternative method for identifying function of the gene corresponding to a polynucleotide 
disclosed herein, dominant negative mutations are readily generated for corresponding proteins that are 
active as homomultimers. A mutant polypeptide will interact with wild-type polypeptides (made from 
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the other allele) and form a non-functional multimer. Thus, a mutation is in a substrate-binding domain, 
a catalytic domain, or a cellular localization domain. Preferably, the mutant polypeptide will be 
overproduced. Point mutations are made that have such an effect In addition, fusion of different 
polypeptides of various lengths to the terminus of a protein can yield dominant negative mutants. 
5 General strategies are available for making dominant negative mutants (see, e.g. , Herskowitz, Nature 
(1987) 329:219). Such techniques can be used to create loss of function mutations, which are useful for 
determining protein function. 
Polypeptides and Variants Thereof 

The polypeptides of the invention include those encoded by the disclosed polynucleotides, as 

10 well as nucleic acids that, by virtue of the degeneracy of the genetic code, are not identical in sequence to 
the disclosed polynucleotides. Thus, the invention includes within its scope a polypeptide encoded by a 
polynucleotide having the sequence of any one of SEQ ID NOS: 1-2396 or a variant thereof. 

In general, the term "polypeptide" as used herein refers to both the foil length polypeptide 
encoded by the recited polynucleotide, the polypeptide encoded by the gene represented by the recited 

15 polynucleotide, as well as portions or fragments thereof. "Polypeptides" also includes variants of the 
naturally occurring proteins, where such variants are homologous or substantially similar to the naturally 
occurring protein, and can be of an origin of the same or different species as the naturally occurring 
protein (e.g., human, murine, or some other species that naturally expresses the recited polypeptide, 
usually a mammalian species). In general, variant polypeptides have a sequence that has at least about 

20 80%, usually at least about 90%, and more usually at least about 98% sequence identity with a 

differentially expressed polypeptide of the invention, as measured by BLAST 2.0 using the parameters 
described above. The variant polypeptides can be naturally or non-naturally glycosylated, i.e., the 
polypeptide has a glycosylation pattern that differs from the glycosylation pattern found in the 
corresponding naturally occurring protein. 

25 The invention also encompasses homologs of the disclosed polypeptides (or fragments thereof) 

where the homologs are isolated from other species, i.e. other animal or plant species, where such 
homologs, usually mammalian species, e.g. rodents, such as mice, rats; domestic animals, e.g., horse, 
cow, dog, cat; and humans. By ec homolog" is meant a polypeptide having at least about 35%, usually at 
least about 40% and more usually at least about 60% amino acid sequence identity to a particular 

30 differentially expressed protein as identified above, where sequence identity is determined using the 
BLAST 2.0 algorithm, with the parameters described supra. 

In general, the polypeptides of the subject invention are provided in a non-naturally occurring 
environment, e.g. are separated from their naturally occurring environment. In certain embodiments, the 
subject protein is present in a composition that is enriched for the protein as compared to a control. As 
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such, purified polypeptide is provided, where by purified is meant that the protein is present in a 
composition that is substantially free of non-differentially expressed polypeptides, where by 
substantially free is meant that less than 90%, usually less than 60% and more usually less than 50% of 
the composition is made up of non-differentially expressed polypeptides. 
5 Also within the scope of the invention are variants; variants of polypeptides include mutants, 

fragments, and fusions. Mutants can include amino acid substitutions, additions or deletions. The amino 
acid substitutions can be conservative amino acid substitutions or substitutions to eliminate non- 
essential amino acids, such as to alter a glycosylation site, a phosphorylation site or an acetylation site, 
or to minimize misfolding by substitution or deletion of one or more cysteine residues that are not 

10 necessary for function. Conservative amino acid substitutions are those that preserve the general charge, 
hydrophobicity/ hydrophilicity, and/or steric bulk of the amino acid substituted. Variants can be 
designed so as to retain or have enhanced biological activity of a particular region of the protein (e.g., a 
functional domain and/or, where the polypeptide is a member of a protein family, a region associated 
with a consensus sequence). Selection of amino acid alterations for production of variants can be based 

15 upon the accessibility (interior vs. exterior) of the amino acid (see, e.g., Go et al, Int. J. Peptide Protein 
Res. (1980) 15:2 1 1), the thermostability of the variant polypeptide (see, e.g., Querol et al, Prot. Eng. 
(1996) 9:265), desired glycosylation sites (see, e.g., Olsen and Thomsen, J. Gen. Microbiol. (1991) 
737:579), desired disulfide bridges (see, e.g., Clarke et al, Biochemistry (1993) 52:4322; and 
Wakarchuk et al., Protein Eng. (1994) 7: 1379), desired metal binding sites (see, e.g., Toma et al, 

20 Biochemistry (1991) 30:97, and Haezerbrouck et al, Protein Eng. (1993) £643), and desired 

substitutions with in proline loops (see, e.g., Masul et al, Appl Env. Microbiol (1994) 60:3579). 
Cysteine-depleted muteins can be produced as disclosed in USPN 4,959,3 14. 

Variants also include fragments of the polypeptides disclosed herein, particularly biologically 
active fragments and/or fragments corresponding to functional domains. Fragments of interest will 

25 typically be at least about 10 aa to at least about 15 aa in length, usually at least about 50 aa in length, 
and can be as long as 300 aa in length or longer, but will usually not exceed about 1000 aa in length, 
where the fragment will have a stretch of amino acids that is identical to a polypeptide encoded by a 
polynucleotide having a sequence of any SEQ ID NOS : 1 -23 96, or a homolog thereof. The protein 
variants described herein are encoded by polynucleotides that are within the scope of the invention. The 

30 genetic code can be used to select the appropriate codons to construct the corresponding variants. 
Computer-Related Embodiments 

In general, a library of polynucleotides is a collection of sequence information, which 
information is provided in either biochemical form (e.g., as a collection of polynucleotide molecules), or 
in electronic form (e.g., as a collection of polynucleotide sequences stored in a computer-readable form, 
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as in a computer system and/or as part of a computer program). The sequence information of the 
polynucleotides can be used in a variety of ways, e.g., as a resource for gene discovery, as a 
representation of sequences expressed in a selected cell type (e.g., cell type markers), and/or as markers 
of a given disease or disease state. In general, a disease marker is a representation of a gene product that 
5 is present in all cells affected by disease either at an increased or decreased level relative to a normal cell 
(e.g., a cell of the same or similar type that is not substantially affected by disease). For example, a 
polynucleotide sequence in a library can be a polynucleotide that represents an mRNA, polypeptide, or 
other gene product encoded by the polynucleotide, that is either overexpressed or underexpressed in a 
breast ductal cell affected by cancer relative to a normal (i.e., substantially disease-free) breast cell 

10 The nucleotide sequence information of the library can be embodied in any suitable form, e.g., 

electronic or biochemical forms. For example, a library of sequence information embodied in electronic 
form comprises an accessible computer data file (or, in biochemical form, a collection of nucleic acid 
molecules) that contains the representative nucleotide sequences of genes that are differentially 
expressed (e.g., overexpressed or underexpressed) as between, for example, i) a cancerous cell and a 

15 normal cell; ii) a cancerous cell and a dysplastic cell; iii) a cancerous cell and a cell affected by a disease 
or condition other than cancer; iv) a metastatic cancerous cell and a normal cell and/or non-metastatic 
cancerous cell; v) a malignant cancerous cell and a non-malignant cancerous cell (or a normal cell) 
and/or vi) a dysplastic cell relative to a normal cell. Other combinations and comparisons of cells 
affected by various diseases or stages of disease will be readily apparent to the ordinarily skilled artisan. 

20 Biochemical embodiments of the library include a collection of nucleic acids that have the sequences of 
the genes in the library, where the nucleic acids can correspond to the entire gene in the library or to a 
fragment thereof as described in greater detail below. 

The polynucleotide libraries of the subject invention generally comprise sequence information of 
a plurality of polynucleotide sequences, where at least one of the polynucleotides has a sequence of any 

25 of SEQ ID NOS:l-2396. By plurality is meant at least 2, usually at least 3 and can include up to all of 
SEQ ED NOS: 1-2396. The length and number of polynucleotides in the library will vary with the nature 
of the library, e.g., if the library is an oligonucleotide array, a cDNA array, a computer database of the 
sequence information, etc. 

Where the library is an electronic library, the nucleic acid sequence information can be present in 

30 a variety of media. "Media" refers to a manufacture, other than an isolated nucleic acid molecule, that 
contains the sequence information of the present invention. Such a manufacture provides the genome 
sequence or a subset thereof in a form that can be examined by means not directly applicable to the 
sequence as it exists in a nucleic acid. For example, the nucleotide sequence of the present invention, 
e.g. the nucleic acid sequences of any of the polynucleotides of SEQ ID NOS: 1-2396, can be recorded 
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on computer readable media, e.g. any medium that can be read and accessed directly by a computer. 
Such media include, but are not limited to: magnetic storage media, such as a floppy disc, a hard disc 
storage medium, and a magnetic tape; optical storage media such as CD-ROM; electrical storage media 
such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. One of 
5 skill in the art can readily appreciate how any of the presently known computer readable mediums can be 
used to create a manufacture comprising a recording of the present sequence information. "Recorded" 
refers to a process for storing information on computer readable medium, using any such methods as 
known in the art. Any convenient data storage structure can be chosen, based on the means used to 
access the stored information. A variety of data processor programs and formats can be used for storage, 

1 0 e.g. word processing text file, database format, etc. In addition to the sequence information, electronic 
versions of the libraries of the invention can be provided in conjunction or connection with other 
computer-readable information and/or other types of computer-readable files (e.g., searchable files, 
executable files, etc, including, but not limited to, for example, search program software, etc.). 

By providing the nucleotide sequence in computer readable form, the information can be 

15 accessed for a variety of purposes. Computer software to access sequence information is publicly 

available. For example, the gapped BLAST (Altschul et al Nucleic Acids Res. (1997) 25:3389-3402) 
and BLAZE (Brutlag et al Comp. Chem. (1993) 17:203) search algorithms on a Sybase system can be 
used to identify open reading frames (ORFs) within the genome that contain homology to ORFs from 
other organisms. 

20 As used herein, "a computer-based system" refers to the hardware means, software means, and 

data storage means used to analyze the nucleotide sequence information of the present invention. The 
minimum hardware of the computer-based systems of the present invention comprises a central 
processing unit (CPU), input means, output means, and data storage means. A skilled artisan can readily 
appreciate that any one of the currently available computer-based system are suitable for use in the 

25 present invention. The data storage means can comprise any manufacture comprising a recording of the 
present sequence information as described above, or a memory access means that can access such a 
manufacture. 

"Search means" refers to one or more programs implemented on the computer-based system, to 
compare a target sequence or target structural moti£ or expression levels of a polynucleotide in a sample, 
30 with the stored sequence information. Search means can be used to identify fragments or regions of the 
genome that match a particular target sequence or target motif. A variety of known algorithms are 
publicly known and commercially available, e.g. MacPattern (EMBL), BLASTN and BLASTX (NCBI). 
A "target sequence" can be any polynucleotide or amino acid sequence of six or more contiguous 
nucleotides or two or more amino acids, preferably from about 10 to 100 amino acids or from about 30 
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to 300 nt A variety of comparing means can be used to accomplish comparison of sequence information 
from a sample (e.g., to analyze target sequences, target motifs, or relative expression levels) with the 
data storage means. A skilled artisan can readily recognize that any one of the publicly available 
homology search programs can be used as the search means for the computer based systems of the 
5 present invention to accomplish comparison of target sequences and motifs. Computer programs to 
analyze expression levels in a sample and in controls are also known in the art. 

A "target structural motif," or "target motif;" refers to any rationally selected sequence or 
combination of sequences in which the sequence(s) are chosen based on a three-dimensional 
configuration that is formed upon the folding of the target motif, or on consensus sequences of 

1 0 regulatory or active sites. There are a variety of target motifs known in the art. Protein target motifs 
include, but arc not limited to, enzyme active sites and signal sequences. Nucleic acid target motifs 
include, but are not limited to, hairpin structures, promoter sequences and other expression elements 
such as binding sites for transcription factors. 

A variety of structural formats for the input and output means can be used to input and output 

15 the information in the computer-based systems of the present invention. One format for an output means 
ranks the relative expression levels of different polynucleotides. Such presentation provides a skilled 
artisan with a ranking of relative expression levels to determine a gene expression profile. 

As discussed above, the "library 55 of the invention also encompasses biochemical libraries of the 
polynucleotides of SEQ ID NOS: 1-2396 , e.g., collections of nucleic acids representing the provided 

20 polynucleotides. The biochemical libraries can take a variety of forms, e.g., a solution of cDNAs, a 
pattern of probe nucleic acids stably associated with a surface of a solid support (i.e., an array) and the 
like. Of particular interest are nucleic acid arrays in which one or more of SEQ ID NOS: 1-2396 is 
represented on the array. By array is meant a an article of manufacture that has at least a substrate with 
at least two distinct nucleic acid targets on one of its surfaces, where the number of distinct nucleic acids 

25 can be considerably higher, typically being at least 10 nt, usually at least 20 nt and often at least 25 nt. 
A variety of different array formats have been developed and are known to those of skill in the art. The 
arrays of the subject invention find use in a variety of applications, including gene expression analysis, 
drug screening, mutation analysis and the like, as disclosed in the above-listed exemplary patent 
documents. 

30 In addition to the above nucleic acid libraries, analogous libraries of polypeptides are also 

provided, where the where the polypeptides of the library will represent at least a portion of the 
polypeptides encoded by SEQ ID NOS: 1-2396. 
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Utilities 

Use of Polynucleotide Probes in Mapping, and in Tissue Profilin g 

Polynucleotide probes, generally comprising at least 12 contiguous nt of a polynucleotide as 

shown in the Sequence Listing, are used for a variety of purposes, such as chromosome mapping of the 
5 polynucleotide and detection of transcription levels. Additional disclosure about preferred regions of the 

disclosed polynucleotide sequences is found in the Examples. A probe that hybridizes specifically to a 

polynucleotide disclosed herein should provide a detection signal at least 5-, 10-, or 20-fold higher than 

the background hybridization provided with other unrelated sequences. 

Detection of Expression Levels. Nucleotide probes are used to detect expression of a gene 
1 0 corresponding to the provided polynucleotide. In Northern blots, mRNA is separated electrophoretically 

and contacted with a probe. A probe is detected as hybridizing to an mRNA species of a particular size. 

The amount of hybridization is quantitated to determine relative amounts of expression, for example 

under a particular condition. Probes are used for in situ hybridization to cells to detect expression. 

Probes can also be used in vivo for diagnostic detection of hybridizing sequences. Probes are typically 
15 labeled with a radioactive isotope. Other types of detectable labels can be used such as chromophores, 

fluors, and enzymes. Other examples of nucleotide hybridization assays are described in WO92/02526 

and USPN 5,124,246. 

Alternatively, the Polymerase Chain Reaction (PCR) is another means for detecting small 
amounts of target nucleic acids (see, e.g., Mullis etal.,Meth. Enzymol (1987) 755:335; USPN 

20 4,683,195; and USPN 4,683,202). Two primer polynucleotides nucleotides that hybridize with the 

target nucleic acids are used to prime the reaction. The primers can be composed of sequence within or 
3' and 5' to the polynucleotides of the Sequence Listing. Alternatively, if the primers are 3' and 5* to these 
polynucleotides, they need not hybridize to them or the complements. After amplification of the target 
with a thermostable polymerase, the amplified target nucleic acids can be detected by methods known in 

25 the art, e.g., Southern blot. mRNA or cDNA can also be detected by traditional blotting techniques (e.g., 
Southern blot, Northern blot, etc.) described in Sambrook et al 9 "Molecular Cloning: A Laboratory 
Manual' 1 (New York, Cold Spring Harbor Laboratory, 1989) (e.g., without PCR amplification). In 
general, mRNA or cDNA generated from mRNA using a polymerase enzyme can be purified and 
separated using gel electrophoresis, and transferred to a solid support, such as nitrocellulose. The solid 

30 support is exposed to a labeled probe, washed to remove any unhybridized probe, and duplexes 
containing the labeled probe are detected. 

Mapping. Polynucleotides of the present invention can be used to identify a chromosome on 
which the corresponding gene resides. Such mapping can be useful in identifying the function of the 
polynucleotide-related gene by its proximity to other genes with known function. Function can also be 
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assigned to the polynucleotide-related gene when particular syndromes or diseases map to the same 
chromosome. For example, use of polynucleotide probes in identification and quantification of nucleic 
acid sequence aberrations is described in USPN 5,783,387. An exemplary mapping method is 
fluorescence in situ hybridization (FISH), which facilitates comparative genomic hybridization to allow 
total genome assessment of changes in relative copy number of DNA sequences (see, e.g., Valdes et al. s 
Methods in Molecular Biology (1997)68:1). Polynucleotides can also be mapped to particular 
chromosomes using, for example, radiation hybrids or chromosome-specific hybrid panels. See Leach et 
ai, Advances in Genetics, (1995) 33:63-99; Walter et ai, Nature Genetics (1994) 7:22; Walter and 
Goodfellow, Trends in Genetics (1992) 9:352. Panels for radiation hybrid mapping are available from 
Research Genetics, Inc., Huntsville, Alabama, USA. Databases for markers using various panels are 
publicly available via the world wide web from the Stanford Genome Center and The Whitehead 
Institute for Biomedical Research/MIT Center for Genome Research. The statistical program RHMAP 
can be used to construct a map based on the data from radiation hybridization with a measure of the 
relative likelihood of one order versus another. RHMAP is available via the world wide web from the 
University of Michigan, Center for Statistical Genetics, Ann Arbor, Michigan. In addition, 
commercial programs are available for identifying regions of chromosomes commonly associated with 
disease, such as cancer. 

Tissue Typing or Profiling. Expression of specific mRNA corresponding to the provided 
polynucleotides can vary in different cell types and can be tissue-specific. This variation of mRNA 
levels in different cell types can be exploited with nucleic acid probe assays to determine tissue types. 
For example, PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid probes 
substantially identical or complementary to polynucleotides listed in the Sequence Listing can determine 
the presence or absence of the corresponding cDNA or mRNA 

Tissue typing can be used to identify the developmental organ or tissue source of a metastatic 
lesion by identifying the expression of a particular marker of that organ or tissue. If a polynucleotide is 
expressed only in a specific tissue type, and a metastatic lesion is found to express that polynucleotide, 
then the developmental source of the lesion has been identified. Expression of a particular 
polynucleotide can be assayed by detection of either the corresponding mRNA or the protein product. 
As would be readily apparent to any forensic scientist, the sequences disclosed herein are useful in 
differentiating human tissue from non-human tissue. In particular, these sequences are useful to 
differentiate human tissue from bird, reptile, and amphibian tissue, for example. 

Use of Polymorphisms. A polynucleotide of the invention can be used in forensics, genetic 
analysis, mapping, and diagnostic applications where the corresponding region of a gene is polymorphic 
in the human populatioa Any means for detecting a polymorphism in a gene can be used, including, but 
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not limited to electrophoresis of protein polymoiphic variants, differential sensitivity to restriction 
enzyme cleavage, and hybridization to allele-specific probes. 
Antibody Production 

Expression products of a polynucleotide of the invention, as well as the corresponding mRNA, 
5 cDNA, or complete gene, can be prepared and used for raising antibodies for experimental, diagnostic, 
and therapeutic purposes. For polynucleotides to which a corresponding gene has not been assigned, this 
provides an additional method of identifying the corresponding gene. The polynucleotide or related 
cDNA is expressed as described above, and antibodies are prepared. These antibodies are specific to an 
epitope on the polypeptide encoded by the polynucleotide, and can precipitate or bind to the 
10 corresponding native protein in a cell or tissue preparation or in a cell-free extract of an in vitro 
expression system. 

Methods for production of antibodies that specifically bind a selected antigen are well known in 
the art. Immunogens for raising antibodies can be prepared by mixing a polypeptide encoded by a 
polynucleotide of the invention with an adjuvant, and/or by making fusion proteins with larger 

15 immunogenic proteins. Polypeptides can also be covalently linked to other larger immunogenic proteins, 
such as keyhole limpet hemocyanin. Immunogens are typically administered intradermally, 
subcutaneously, or intramuscularly to experimental animals such as rabbits, sheep, and mice, to generate 
antibodies. Monoclonal antibodies can be Monoclonal antibodies can be generated by isolating spleen 
cells and fusing myeloma cells to form hybridomas. Alternatively, the selected polynucleotide is 

20 administered directly, such as by intramuscular injection, and expressed in vivo. The expressed protein 
generates a variety of protein-specific immune responses, including production of antibodies, 
comparable to administration of the protein. 

Preparations of polyclonal and monoclonal antibodies specific for polypeptides encoded by a 
selected polynucleotide are made using standard methods known in the art. The antibodies specifically 

25 bind to epitopes present in the polypeptides encoded by polynucleotides disclosed in the Sequence 
Listing. Typically, at least 6, 8, 10, or 12 contiguous amino acids are required to form an epitope. 
Epitopes that involve non-contiguous amino acids may require a longer polypeptide, e.g., at least 15, 25, 
or 50 amino acids. Antibodies that specifically bind to human polypeptides encoded by the provided 
polypeptides should provide a detection signal at least 5-, 10-, or 20-fold higher than a detection signal 

30 provided with other proteins when used in Western blots or other immunochemical assays. Preferably, 
antibodies that specifically polypeptides of the invention do not bind to other proteins in 
immunochemical assays at detectable levels and can immunoprecipitate the specific polypeptide from 
solution. 



25 



WO 01/66753 



PCT/US01/07787 



The invention also contemplates naturally occurring antibodies specific for a polypeptide of the 
invention. For example, serum antibodies to a polypeptide of the invention in a human population can 
be purified by methods well known in the art, e.g., by passing antiserum over a column to which the 
corresponding selected polypeptide or fusion protein is bound. The bound antibodies can then be eluted 
5 from the column, for example using a buffer with a high salt concentration. 

In addition to the antibodies discussed above, the invention also contemplates genetically 
engineered antibodies, antibody derivatives (e.g., single chain antibodies, antibody fragments (e.g., Fab, 
etc.)), according to methods well known in the art. 
Polynucleotides or Arrays for Diagnostics 

10 Polynucleotide arrays provide a high throughput technique that can assay a large number of 

polynucleotide sequences in a sample. This technology can be used as a diagnostic and as a tool to test 
for differential expression, e.g., to determine function of an encoded protein. Arrays can be created by 
spotting polynucleotide probes onto a substrate (e.g., glass, nitrocelllose, etc.) in a two-dimensional 
matrix or array having bound probes. The probes can be bound to the substrate by either covalent bonds 

15 or by non-specific interactions, such as hydrophobic interactions. Samples of polynucleotides can be 
detectably labeled (e.g., using radioactive or fluorescent labels) and then hybridized to the probes. 
Double stranded polynucleotides, comprising the labeled sample polynucleotides bound to probe 
polynucleotides, can be detected once the unbound portion of the sample is washed away. Techniques 
for constructing arrays and methods of using these arrays are described in EP 799 897; WO 97/29212; 

20 WO 97/273 17; EP 785 280; WO 97/02357; USPN 5,593,839; USPN 5,578,832; EP 728 520; USPN 
5,599,695; EP 72 1 016; USPN 5,556,752; WO 95/22058; and USPN 5,63 1,734. Arrays can be used 
to, for example, examine differential expression of genes and can be used to determine gene function. 
For example, arrays can be used to detect differential expression of a polynucleotide between a test cell 
and control cell (e.g., cancer cells and normal cells). For example, high expression of a particular 

25 message in a cancer cell, which is not observed in a corresponding normal cell, can indicate a cancer 
specific gene product. Exemplary uses of arrays are further described in, for example, Pappalarado et 
al, Sem. Radiation Oncol. (1998) 5:217; and Ramsay Nature Biotechnol (1998) 16.40. 
Differential Expression in Diagnosis 

The polynucleotides of the invention can also be used to detect differences in expression levels 
30 between two cells, e.g., as a method to identify abnormal or diseased tissue in a human. For 

polynucleotides corresponding to profiles of protein families, the choice of tissue can be selected 
according to the putative biological function. In general, the expression of a gene corresponding to a 
specific polynucleotide is compared between a first tissue that is suspected of being diseased and a 
second, normal tissue of the human. The tissue suspected of being abnormal or diseased can be derived 
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from a different tissue type of the human, but preferably it is derived from the same tissue type; for 
example an intestinal polyp or other abnormal growth should be compared with normal intestinal tissue. 
The normal tissue can be the same tissue as that of the test sample, or any normal tissue of the patient, 
especially those that express the polynucleotide-related gene of interest (e.g., brain, thymus, testis, heart, 
5 prostate, placenta, spleen, small intestine, skeletal muscle, pancreas, and the mucosal lining of the colon). 
A difference between the polynucleotide-related gene, mRNA, or protein in the two tissues which are 
compared, for example in molecular weight, amino acid or nucleotide sequence, or relative abundance, 
indicates a change in the gene, or a gene which regulates it, in the tissue of the human that was suspected 
of being diseased. Examples of detection of differential expression and its use in diagnosis of cancer are 

10 described in USPNs 5,688,641 and 5,677,125. 

A genetic predisposition to disease in a human can also be detected by comparing expression 
levels of an mRNA or protein corresponding to a polynucleotide of the invention in a fetal tissue with 
levels associated in normal fetal tissue. Fetal tissues that are used for this purpose include, but are not 
limited to, amniotic fluid, chorionic villi, blood, and the blastomere of an in vitro-fertilized embryo. The 

15 comparable normal polynucleotide-related gene is obtained from any tissue. The mRNA or protein is 
obtained from a normal tissue of a human in which the polynucleotide-related gene is expressed. 
Differences such as alterations in the nucleotide sequence or size of the same product of the fetal 
polynucleotide-related gene or mRNA, or alterations in the molecular weight, amino acid sequence, or 
relative abundance of fetal protein, can indicate a germline mutation in the polynucleotide-related gene of 

20 the fetus, which indicates a genetic predisposition to disease. In general, diagnostic, prognostic, and other 
methods of the invention based on differential expression involve detection of a level or amount of a 
gene product, particularly a differentially expressed gene product, in a test sample obtained from a 
patient suspected of having or being susceptible to a disease (e.g., breast cancer, lung cancer, colon 
cancer and/or metastatic forms thereof), and comparing the detected levels to those levels found in 

25 normal cells (e.g., cells substantially unaffected by cancer) and/or other control cells (e.g., to 

differentiate a cancerous cell from a cell affected by dysplasia). Furthermore, the severity of the disease 
can be assessed by comparing the detected levels of a differentially expressed gene product with those 
levels detected in samples representing the levels of differentially gene product associated with varying 
degrees of severity of disease. It should be noted that use of the term "diagnostic" herein is not 

30 necessarily meant to exclude "prognostic" or "prognosis," but rather is used as a matter of convenience. 

The term "differentially expressed gene" is generally intended to encompass a polynucleotide 
that can, for example, include an open reading frame encoding a gene product (e.g., a polypeptide), 
and/or mtrons of such genes and adjacent 5' and 3' non-coding nucleotide sequences involved in the 
regulation of expression, up to about 20 kb beyond the coding region, but possibly further in either 
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direction. The gene can be introduced into an appropriate vector for extrachromosomal maintenance or 
for integration into a host genome. In general, a difference in expression level associated with a decrease 
in expression level of at least about 25%, usually at least about 50% to 75%, more usually at least about 
90% or more is indicative of a differentially expressed gene of interest, i.e., a gene that is underexpressed 
5 or down-regulated in the test sample relative to a control sample. Furthermore, a difference in 

expression level associated with an increase in expression of at least about 25%, usually at least about 
50% to 75%, more usually at least about 90% and can be at least about 1 Vi-fold, usually at least about 
2-fold to about 10-fold, and can be about 100-fold to about 1,000-fold increase relative to a control 
sample is indicative of a differentially expressed gene of interest, i.e., an overexpressed or up-regulated 
10 gene. 

"Differentially expressed polynucleotide" as used herein means a nucleic acid molecule (RNA or 
DNA) comprising a sequence that represents a differentially expressed gene, e.g., the differentially 
expressed polynucleotide comprises a sequence (e.g., an open reading frame encoding a gene product) 
that uniquely identifies a differentially expressed gene so that detection of the differentially expressed 
1 5 polynucleotide in a sample is correlated with the presence of a differentially expressed gene in a sample. 
"Differentially expressed polynucleotides" is also meant to encompass fragments of the disclosed 
polynucleotides, e.g., fragments retaining biological activity, as well as nucleic acids homologous, 
substantially similar, or substantially identical (e.g., having about 90% sequence identity) to the 
disclosed polynucleotides. 

20 "Diagnosis" as used herein generally includes determination of a subject's susceptibility to a 

disease or disorder, determination as to whether a subject is presently affected by a disease or disorder, 
as well as to the prognosis of a subject affected by a disease or disorder (e.g., identification of pre- 
metastatic or metastatic cancerous states, stages of cancer, or responsiveness of cancer to therapy). The 
present invention particularly encompasses diagnosis of subjects in the context of breast cancer (e.g., 

25 carcinoma in situ (e.g. , ductal carcinoma in situ), estrogen receptor (ER)-positive breast cancer, ER- 
negative breast cancer, or other forms and/or stages of breast cancer), lung cancer (e.g, small cell 
carcinoma, non-small cell carcinoma, mesothelioma, and other forms and/or stages of lung cancer), and 
colon cancer (e.g., adenomatous polyp, colorectal carcinoma, and other forms and/or stages of colon 
cancer). 

30 "Sample" or "biological sample" as used throughout here are generally meant to refer to samples 

of biological fluids or tissues, particularly samples obtained from tissues, especially from cells of the 
type associated with the disease for which the diagnostic application is designed (e.g., ductal 
adenocarcinoma), and the like. "Samples" is also meant to encompass derivatives and fractions of such 

28 



WO 01/66753 



PCT/US01/07787 



samples (e.g., cell lysates). Where the sample is solid tissue, the cells of the tissue can be dissociated or 
tissue sections can be analyzed. 

Methods of the subject invention useful in diagnosis or prognosis typically involve comparison 
of the abundance of a selected differentially expressed gene product in a sample of interest with that of a 
5 control to determine any relative differences in the expression of the gene product, where the difference 
can be measured qualitatively and/or quantitatively. Quantitation can be accomplished, for example, by 
comparing the level of expression product detected in the sample with the amounts of product present in 
a standard curve. A comparison can be made visually; by using a technique such as densitometry, with 
or without computerized assistance; by preparing a representative library of cDNA clones of mRNA 

1 0 isolated from a test sample, sequencing the clones in the library to determine that number of cDNA 
clones corresponding to the same gene product, and analyzing the number of clones corresponding to 
that same gene product relative to the number of clones of the same gene product in a control sample; or 
by using an array to detect relative levels of hybridization to a selected sequence or set of sequences, and 
comparing the hybridization pattern to that of a control The differences in expression are then 

15 correlated with the presence or absence of an abnormal expression pattern. A variety of different 
methods for determining the nucleic acid abundance in a sample are known to those of skill in the art 
(see, e.g., WO 97/273 17). 

In general, diagnostic assays of the invention involve detection of a gene product of a the 
polynucleotide sequence (eg., mRNA or polypeptide) that corresponds to a sequence of SEQ ID NOS: 1- 

20 2396 The patient from whom the sample is obtained can be apparently healthy, susceptible to disease 
{e.g., as determined by family history or exposure to certain environmental factors), or can already be 
identified as having a condition in which altered expression of a gene product of the invention is 
implicated. 

Diagnosis can be determined based on detected gene product expression levels of a gene product 
25 encoded by at least one, preferably at least two or more, at least 3 or more, or at least 4 or more of the 
polynucleotides having a sequence set forth in SEQ ID NOS: 1-2396, and can involve detection of 
expression of genes corresponding to all of SEQ ID NOS: 1-2396 and/or additional sequences that can 
serve as additional diagnostic markers and/or reference sequences. Where the diagnostic method is 
designed to detect the presence or susceptibility of a patient to cancer, the assay preferably involves 
30 detection of a gene product encoded by a gene corresponding to a polynucleotide that is differentially 
expressed in cancer. Examples of such differentially expressed polynucleotides are described in the 
Examples below. Given the provided polynucleotides and information regarding their relative 
expression levels provided herein, assays using such polynucleotides and detection of their expression 
levels in diagnosis and prognosis will be readily apparent to the ordinarily skilled artisan. 
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Any of a variety of detectable labels can be used in connection with the various embodiments of 
the diagnostic methods of the invention. Suitable detectable labels include fluorochromes,(e.g. 
fluorescein isothiocyanate (FITC), rhodamine, Texas Red, phycoeiythrin, allophycocyanin, 6- 
carboxyfluorescein (6-FAM), 2^7^dimethoxy-4^5'-dicWorcH6-carboxyfluorescein, 6-carboxy-X- 
5 ' rhodamine (ROX), 6-carboxy-2\4^7\4,7-hexacMorofluorescem (HEX), 5-carboxyfluorescein (5-FAM) 
or N^,N\N^tetramemyl-6-carboxyrhodamine (TAMRA)), radioactive labels, (e.g. 32 P, 35 S, 3 H, etc), 
and the like. The detectable label can involve a two stage systems (e.g., biotin-avidin, hapten-anti- 
hapten antibody, etc. ) 

Reagents specific for the polynucleotides and polypeptides of the invention, such as antibodies 

10 and nucleotide probes, can be supplied in a kit for detecting the presence of an expression product in a 
biological sample. The kit can also contain buffers or labeling components, as well as instructions for 
using the reagents to detect and quantify expression products in the biological sample. Exemplary 
embodiments of the diagnostic methods of the invention are described below in more detail. 

Polypeptide detection in diagnosis. In one embodiment, the test sample is assayed for the level 

15 of a differentially expressed polypeptide. Diagnosis can be accomplished using any of a number of 
methods to determine the absence or presence or altered amounts of the differentially expressed 
polypeptide in the test sample. For example, detection can utilize staining of cells or histological 
sections with labeled antibodies, performed in accordance with conventional methods. Cells can be 
permeabilized to stain cytoplasmic molecules. In general, antibodies that specifically bind a 

20 differentially expressed polypeptide of the invention are added to a sample, and incubated for a period of 
time sufficient to allow binding to the epitope, usually at least about 10 minutes. The antibody can be 
detectably labeled for direct detection (e.g., using radioisotopes, enzymes, fluoresces, chemiluminescers, 
and the like), or can be used in conjunction with a second stage antibody or reagent to detect binding 
(e.g., biotin with horseradish peroxidase-conjugated avidin, a secondary antibody conjugated to a 

25 fluorescent compound, e.g. fluorescein, rhodamine, Texas red, etc.). The absence or presence of 

antibody binding can be determined by various methods, including flow cytometry of dissociated cells, 
microscopy, radiography, scintillation counting, etc. Any suitable alternative methods can of qualitative 
or quantitative detection of levels or amounts of differentially expressed polypeptide can be used, for 
example ELIS A, western blot, immunoprecipitation, radioimmunoassay, etc. 

30 mRNA detection. The diagnostic methods of the invention can also or alternatively involve 

detection of mRNA encoded by a gene corresponding to a differentially expressed polynucleotides of the 
invention. Any suitable qualitative or quantitative methods known in- the art for detecting specific 
mRNAs can be used. mRNA can be detected by, for example, in situ hybridization in tissue sections, by 
reverse transcriptase-PCR, or in Northern blots containing poly A+ mRNA. One of skill in the art can 

30 



WO 01/66753 



PCT/US01/07787 



readily use these methods to determine differences in the size or amount of mRNA transcripts between 
two samples. mRNA expression levels in a sample can also be determined by generation of a library of 
expressed sequence tags (ESTs) from the sample, where the EST library is representative of sequences 
present in the sample (Adams, et al., (1991) Science 252:1651). Enumeration of the relative 
5 representation of ESTs within the library can be used to approximate the relative representation of the 
gene transcript within the starting sample. The results of EST analysis of a test sample can then be 
compared to EST analysis of a reference sample to determine the relative expression levels of a selected 
polynucleotide, particularly a polynucleotide corresponding to one or more of the differentially expressed 
genes described herein. Alternatively, gene expression in a test sample can be performed using serial 

10 analysis of gene expression (SAGE) methodology (e.g., Velculescu et al., Science (1995) 270:484) or 
differential display (DD) methodology (see, e.g., U.S. 5,776,683; and U.S. 5,807,680). 

Alternatively, gene expression can be analyzed using hybridization analysis. Oligonucleotides 
or cDNA can be used to selectively identify or capture DNA or RNA of specific sequence composition, 
and the amount of RNA or cDNA hybridized to a known capture sequence determined qualitatively or 

1 5 quantitatively, to provide information about the relative representation of a particular message within the 
pool of cellular messages in a sample. Hybridization analysis can be designed to allow for concurrent 
screening of the relative expression of hundreds to thousands of genes by using, for example, array- 
based technologies having high density formats, including filters, microscope slides, or microchips, or 
solution-based technologies that use spectroscopic analysis (e.g., mass spectrometry). One exemplary 

20 use of arrays in the diagnostic methods of the invention is described below in more detail. 

Use of a single gene in diagnostic applications. The diagnostic methods of the invention can 
focus on the expression of a single differentially expressed gene. For example, the diagnostic method 
can involve detecting a differentially expressed gene, or a polymorphism of such a gene {e.g., a 
polymorphism in an coding region or control region), that is associated with disease. Disease-associated 

25 polymorphisms can include deletion or truncation of the gene, mutations that alter expression level 
and/or affect activity of the encoded protein, etc. 

A number of methods are available for analyzing nucleic acids for the presence of a specific 
sequence, e.g. a disease associated polymorphism. Where large amounts of DNA are available, genomic 
DNA is used directly. Alternatively, the region of interest is cloned into a suitable vector and grown in 

30 sufficient quantity for analysis. Cells that express a differentially expressed gene can be used as a source 
of mRNA, which can be assayed directly or reverse transcribed into cDNA for analysis. The nucleic acid 
can be amplified by conventional techniques, such as the polymerase chain reaction (PCR), to provide 
sufficient amounts for analysis, and a detectable label can be included in the amplification reaction (e.g., 
using a detectably labeled primer or detectably labeled oligonucleotides) to facilitate detection. 
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Alternatively, various methods are also known in the art that utilize oligonucleotide ligation as a means 
of detecting polymorphisms, see e.g., Riley et al., NucL Adds Res. (1990) iS:2887; and Delahunty et 
al.,Am. J. Hum. Genet. (1996) 55:1239. 

The amplified or cloned sample nucleic acid can be analyzed by one of a number of methods 
5 known in the art. The nucleic acid can be sequenced by dideoxy or other methods, and the sequence of 
bases compared to a selected sequence, e.g. , to a wild-type sequence. Hybridization with the 
polymorphic or variant sequence can also be used to determine its presence in a sample (e.g., by 
Southern blot, dot blot, etc.). The hybridization pattern of a polymorphic or variant sequence and a 
control sequence to an array of oligonucleotide probes immobilized on a solid support, as described in 

10 US 5,445,934, or in WO 95/35505, can also be used as a means of identifying polymorphic or variant 
sequences associated with disease. Single strand conformational polymorphism (SSCP) analysis, 
denaturing gradient gel electrophoresis (DGGE), and heteroduplex analysis in gel matrices are used to 
detect conformational changes created by DNA sequence variation as alterations in electrophoretic 
mobility. Alternatively, where a polymorphism creates or destroys a recognition site for a restriction 

15 endonuclease, the sample is digested with that endonuclease, and the products size fractionated to 
determine whether the fragment was digested. Fractionation is performed by gel or capillary 
electrophoresis, particularly acrylamide or agarose gels. 

Screening for mutations in a gene can be based on the functional or antigenic characteristics of 
the protein. Protein truncation assays are useful in detecting deletions that can affect the biological 

20 activity of the protein. Various immunoassays designed to detect polymorphisms in proteins can be used 
in screening. Where many diverse genetic mutations lead to a particular disease phenotype, functional 
protein assays have proven to be effective screening tools. The activity of the encoded protein can be 
determined by comparison with the wild-type protein. 

Pattern matching in diagnosis using arrays. In another embodiment, the diagnostic and/or 

25 prognostic methods of the invention involve detection of expression of a selected set of genes in a test 
sample to produce a test expression pattern (TEP). The TEP is compared to a reference expression 
pattern (REP), which is generated by detection of expression of the selected set of genes in a reference 
sample (e.g. , a positive or negative control sample). The selected set of genes includes at least one of the 
genes of the invention, which genes correspond to the polynucleotide sequences of SEQ ID NOS: 1-2396. 

30 Of particular interest is a selected set of genes that includes gene differentially expressed in the disease 
for which the test sample is to be screened. 

"Reference sequences" or "reference polynucleotides" as used herein in the context of 
differential gene expression analysis and diagnosis/prognosis refers to a selected set of polynucleotides, 
which selected set includes at least one or more of the differentially expressed polynucleotides described 
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herein. A plurality of reference sequences, preferably comprising positive and negative control 
sequences, can be included as reference sequences. Additional suitable reference sequences are found in 
GenBank, Unigene, and other nucleotide sequence databases (including, e.g., expressed sequence tag 
(EST), partial, and full-length sequences). 
5 "Reference array" means an array having reference sequences for use in hybridization with a 

sample, where the reference sequences include all, at least one of, or any subset of the differentially 
expressed polynucleotides described herein. Usually such an array will include at least 3 different 
reference sequences, and can include any one or all of the provided differentially expressed sequences. 
Arrays of interest can further comprise sequences, including polymorphisms, of other genetic sequences, 

10 particularly other sequences of interest for screening for a disease or disorder (e.g., cancer, dysplasia, or 
other related or unrelated diseases, disorders, or conditions). The oligonucleotide sequence on the array 
will usually be at least about 1 2 nt in length, and can be of about the length of the provided sequences, or 
can extend into the flanking regions to generate fragments of 100 nt to 200 nt in length or more. 
Reference arrays can be produced according to any suitable methods known in the art. For example, 

15 methods of producing large arrays of oligonucleotides are described in U.S. 5,134,854, and U.S. 
5,445,934 using light-directed synthesis techniques. Using a computer controlled system, a 
heterogeneous array of monomers is converted, through simultaneous coupling at a number of reaction 
sites, into a heterogeneous array of polymers. Alternatively, microarrays are generated by deposition of 
pre-synthesized oligonucleotides onto a solid substrate, for example as described in PCT published 

20 application no. WO 95/35505. 

A "reference expression pattern" or "REP" as used herein refers to the relative levels of 
expression of a selected set of genes, particularly of differentially expressed genes, that is associated 
with a selected cell type, e.g., a normal cell, a cancerous cell, a cell exposed to an environmental 
stimulus, and the like. A "test expression pattern" or "TEP" refers to relative levels of expression of a 

25 selected set of genes, particularly of differentially expressed genes, in a test sample (e.g. , a cell of 
unknown or suspected disease state, from which mRNA is isolated). 

REPs can be generated in a variety of ways according to methods well known in the art. For 
example, REPs can be generated by hybridizing a control sample to an array having a selected set of 
polynucleotides (particularly a selected set of differentially expressed polynucleotides), acquiring the 

30 hybridization data from the array, and storing the data in a format that allows for ready comparison of 
the REP with a TEP. Alternatively, all expressed sequences in a control sample can be isolated and 
sequenced, e.g., by isolating mRNA from a control sample, converting the mRNA into cDNA, and 
sequencing the cDNA. The resulting sequence information roughly or precisely reflects the identity and 
relative number of expressed sequences in the sample. The sequence information can then be stored in a 
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format (e.g., a computer-readable format) that allows for ready comparison of the REP with a TEP. The 
REP can be normalized prior to or after data storage, and/or can be processed to selectively remove 
sequences of expressed genes that are of less interest or that might complicate analysis (e.g., some or all 
of the sequences associated with housekeeping genes can be eliminated from REP data). 
5 TEPs can be generated in a manner similar to REPs, e.g. , by hybridizing a test sample to an 

array having a selected set of polynucleotides, particularly a selected set of differentially expressed 
polynucleotides, acquiring the hybridization data from the array, and storing the data in a format that 
allows for ready comparison of the TEP with a REP. The REP and TEP to be used in a comparison can 
be generated simultaneously, or the TEP can be compared to previously generated and stored REPs. 

10 In one embodiment of the invention, comparison of a TEP with a REP involves hybridizing a 

test sample with a reference array, where the reference array has one or more reference sequences for use 
in hybridization with a sample. The reference sequences include all, at least one of, or any subset of the 
differentially expressed polynucleotides described herein. Hybridization data for the test sample is 
acquired, the data normalized, and the produced TEP compared with a REP generated using an array 

15 having the same or similar selected set of differentially expressed polynucleotides. Probes that 
correspond to sequences differentially expressed between the two samples will show decreased or 
increased hybridization efficiency for one of the samples relative to the other. 

Methods for collection of data from hybridization of samples with a reference arrays are well 
known in the art. For example, the polynucleotides of the reference and test samples can be generated 

20 using a detectable fluorescent label, and hybridization of the polynucleotides in the samples detected by 
scanning the microarrays for the presence of the detectable label using, for example, a microscope and 
light source for directing light at a substrate. A photon counter detects fluorescence from the substrate, 
while an x-y translation stage varies the location of the substrate. A confocal detection device that can 
be used in the subject methods is described in USPN 5,63 1,734. A scanning laser microscope is 

25 described in Shalon et al., Genome Res. (1996) 6:639. A scan, using the appropriate excitation line, is 
performed for each fluorophore used. The digital images generated from the scan are then combined for 
subsequent analysis. For any particular array element, the ratio of the fluorescent signal from one 
sample (e.g., a test sample) is compared to the fluorescent signal from another sample (e.g., a reference 
sample), and the relative signal intensity determined. 

30 Methods for analyzing the data collected from hybridization to arrays are well known in the art. 

For example, where detection of hybridization involves a fluorescent label, data analysis can include the 
steps of determining fluorescent intensity as a function of substrate position from the data collected, 
removing outliers, i.e. data deviating from a predetermined statistical distribution, and calculating the 
relative binding affinity of the targets from the remaining data. The resulting data can be displayed as an 
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image with the intensity in each region varying according to the binding affinity between targets and 
probes. 

In general, the test sample is classified as having a gene expression profile corresponding to that 
associated with a disease or non-disease state by comparing the TEP generated from the test sample to 
5 one or more REPs generated from reference samples (e.g. , from samples associated with cancer or 
specific stages of cancer, dysplasia, samples affected by a disease other than cancer, normal samples, 
etc.). The criteria for a match or a substantial match between a TEP and a REP include expression of the 
same or substantially the same set of reference genes, as well as expression of these reference genes at 
substantially the same levels (e.g., no significant difference between the samples for a signal associated 
10 with a selected reference sequence after normalization of the samples, or at least no greater than about 
25% to about 40% difference in signal strength for a given reference sequence. In general, a pattern 
match between a TEP and a REP includes a match in expression, preferably a match in qualitative or 
quantitative expression level, of at least one of, all or any subset of the differentially expressed genes of 
the invention. 

15 Pattern matching can be performed manually, or can be performed using a computer program. 

Methods for preparation of substrate matrices (e.g., arrays), design of oligonucleotides for use with such 
matrices, labeling of probes, hybridization conditions, scanning of hybridized matrices, and analysis of 
patterns generated, including comparison analysis, are described in, for example, U.S. 5,800,992. 

Targets for inhibtion of tumor growth. The polynucleotides of the invention can correspond to 

20 therapeutic targets, and modulation of expression and/or activity of these targets can provide for 
inhibition of tumor growth. For example, where overexpression of a gene is associated with tumor 
growth or metastasis, the gene product is a suitable for target for inhibition of its expression and/or 
activity to facilitate inihibtion of tumor growth or metastasis. The polynucleotides of the invention can 
correspond to such genes, and thus in some embodiments the antisense of these polynucleotides can be 

25 used to inhibit the expression of the gene and its corresponding gene product 
Diagnosis, Prognosis and Management of Cancer 

The polynucleotides of the invention and their gene products are of particular interest as genetic 
or biochemical markers (e.g., in blood or tissues) that will detect the earliest changes along the 
carcinogenesis pathway and/or to monitor the efficacy of various therapies and preventive interventions. 

30 For example, the level of expression of certain polynucleotides can be indicative of a poorer prognosis, 
and therefore warrant more aggressive chemo- or radio-therapy for a patient or vice versa. The 
correlation of novel surrogate tumor specific features with response to treatment and outcome in patients 
can define prognostic indicators that allow the design of tailored therapy based on the molecular profile 
of the tumor. These therapies include antibody targeting and gene therapy. Determining expression of 
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certain polynucleotides and comparison of a patients profile with known expression in normal tissue and 
variants of the disease allows a determination of the best possible treatment for a patient, both in terms 
of specificity of treatment and in terms of comfort level of the patient. Surrogate tumor markers, such as 
polynucleotide expression, can also be used to better classify, and thus diagnose and treat, different 
5 forms and disease states of cancer. Two classifications widely used in oncology that can benefit from 
identification of the expression levels of the polynucleotides of the invention are staging of the cancerous 
disorder, and grading the nature of the cancerous tissue. 

The polynucleotides of the invention can be useful to monitor patients having or susceptible to 
cancer to detect potentially malignant events at a molecular level before they are detectable at a gross 

10 morphological level. Furthermore, a polynucleotide of the invention identified as important for one type 
of cancer can also have implications for development or risk of development of other types of cancer, 
e.g., where a polynucleotide is differentially expressed across various cancer types. Thus, for example, 
expression of a polynucleotide that has clinical implications for metastatic colon cancer can also have 
clinical implications for stomach cancer or endometrial cancer. 

15 Staging. Staging is a process used by physicians to describe how advanced the cancerous state 

is in a patient. Staging assists the physician in determining a prognosis, planning treatment and 
evaluating the results of such treatment. Staging systems vary with the types of cancer, but generally 
involve the following "TNM" system: the type of tumor, indicated by T; whether the cancer has 
metastasized to nearby lymph nodes, indicated by N; and whether the cancer has metastasized to more 

20 distant parts of the body, indicated by M. Generally, if a cancer is only detectable in the area of the 

primary lesion without having spread to any lymph nodes it is called Stage I. If it has spread only to the 
closest lymph nodes, it is called Stage II. In Stage m, the cancer has generally spread to the lymph nodes 
in near proximity to the site of the primary lesion. Cancers that have spread to a distant part of the body, 
such as the liver, bone, brain or other site, are Stage IV, the most advanced stage. 

25 The polynucleotides of the invention can facilitate fine-tuning of the staging process by 

identifying markers for the aggresivity of a cancer, e.g. the metastatic potential, as well as the presence in 
different areas of the body. Thus, a Stage II cancer with a polynucleotide signifying a high metastatic 
potential cancer can be used to change a borderline Stage n tumor to a Stage HI tumor, justifying more 
aggressive therapy. Conversely, the presence of a polynucleotide signifying a lower metastatic potential 

30 allows more conservative staging of a tumor. 

Grading of cancers. Grade is a term used to describe how closely a tumor resembles normal 
tissue of its same type. The microscopic appearance of a tumor is used to identify tumor grade based on 
parameters such as cell morphology, cellular organization, and other markers of dififerentiatioa As a 
general rule, the grade of a tumor corresponds to its rate of growth or aggressiveness, with 
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undifferentiated or high-grade tumors being more aggressive than well differentiated or low-grade 
tumors. The following guidelines are generally used for grading tumors: 1) GX Grade cannot be 
assessed; 2) Gl Well differentiated; G2 Moderately well differentiated; 3) G3 Poorly differentiated; 4) 
G4 Undifferentiated. The polynucleotides of the invention can be especially valuable in determining the 
5 grade of the tumor, as they not only can aid in determining the differentiation status of the cells of a 
tumor, they can also identify factors other than differentiation that are valuable in determining the 
aggressiveness of a tumor, such as metastatic potential. 

Detection of lung cancer. The polynucleotides of the invention can be used to detect lung cancer 
in a subject. Although there are more than a dozen different kinds of lung cancer, the two main types of 

10 lung cancer are small cell and nonsmall cell, which encompass about 90% of all lung cancer cases. Small 
cell carcinoma (also called oat cell carcinoma) usually starts in one of the larger bronchial tubes, grows 
fairly rapidly, and is likely to be large by the time of diagnosis. Nonsmall cell lung cancer (NSCLC) is 
made up of three general subtypes of lung cancer. Epidermoid carcinoma (also called squamous cell 
carcinoma) usually starts in one of the larger bronchial tubes and grows relatively slowly. The size of 

15 these tumors can range from very small to quite large. Adenocarcinoma starts growing near the outside 
surface of the lung and can vary in both size and growth rate. Some slowly growing adenocarcinomas are 
described as alveolar cell cancer. Large cell carcinoma starts near the surface of the lung, grows rapidly, 
and the growth is usually fairly large when diagnosed. Other less common forms of lung cancer are 
carcinoid, cylindroma, mucoepidermoid, and malignant mesothelioma. 

20 The polynucleotides of the invention, e.g., polynucleotides differentially expressed in normal 

cells versus cancerous lung cells (e.g., tumor cells of high or low metastatic potential) or between types 
of cancerous lung cells (e.g., high metastatic versus low metastatic), can be used to distinguish types of 
lung cancer as well as identifying traits specific to a certain patient's cancer and selecting an appropriate 
therapy. For example, if the patient's biopsy expresses a polynucleotide that is associated with a low 

25 metastatic potential, it may justify leaving a larger portion of the patient's lung in surgery to remove the . 
lesion. Alternatively, a smaller lesion with expression of a polynucleotide that is associated with high 
metastatic potential may justify a more radical removal of lung tissue and/or the surrounding lymph 
nodes, even if no metastasis can be identified through pathological examination. 

Detection of breast cancer. The majority of breast cancers are adenocarcinomas subtypes, which 

30 can be summarized as follows: 1) ductal carcinoma in situ (DCIS), including comedocarcinoma; 2) 

infiltrating (or invasive) ductal carcinoma (IDC); 3) lobular carcinoma in situ (LCIS); 4) infiltrating (or 
invasive) lobular carcinoma (ILC); 5) inflammatory breast cancer; 6) medullary carcinoma; 7) mucinous 
carcinoma; 8) Paget's disease of the nipple; 9) Phyllodes tumor; and 10) tubular carcinoma. 
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The expression of polynucleotides of the invention can be used in the diagnosis and management 
of breast cancer, as well as to distinguish between types of breast cancer. Detection of breast cancer can 
be determined using expression levels of any of the appropriate polynucleotides of the invention, either 
alone or in combination. Determination of the aggressive nature and/or the metastatic potential of a 
5 breast cancer can also be determined by comparing levels of one or more polynucleotides of the 
invention and comparing levels of another sequence known to vary in cancerous tissue, e.g. ER 
expression. In addition, development of breast cancer can be detected by examining the ratio of 
expression of a differentially expressed polynucleotide to the levels of steroid hormones (e.g., 
testosterone or estrogen) or to other hormones (e.g., growth hormone, insulin). Thus expression of 

10 specific marker polynucleotides can be used to discriminate between normal and cancerous breast tissue, 
to discriminate between breast cancers with different cells of origin, to discriminate between breast 
cancers with different potential metastatic rates, etc. 

Detection of colon cancer. The polynucleotides of the invention exhibiting the appropriate 
expression pattern can be used to detect colon cancer in a subject. Colorectal cancer is one of the most 

15 common neoplasms in humans and perhaps the most frequent form of hereditary neoplasia. Prevention 
and early detection are key factors in controlling and curing colorectal cancer. Colorectal cancer begins 
as polyps, which are small, benign growths of cells that form on the inner lining of the colon. Over a 
period of several years, some of these polyps accumulate additional mutations and become cancerous. 
Multiple familial colorectal cancer disorders have been identified, which are summarized as follows: 1) 

20 Familial adenomatous polyposis (FAP); 2) Gardner's syndrome; 3) Hereditary nonpolyposis colon 

cancer (HNPCC); and 4) Familial colorectal cancer in Ashkenazi Jews. The expression of appropriate 
polynucleotides of the invention can be used in the diagnosis, prognosis and management of colorectal 
cancer. Detection of colon cancer can be determined using expression levels of any of these sequences 
alone or in combination with the levels of expression. Determination of the aggressive nature and/or the 

25 metastatic potential of a colon cancer can be determined by comparing levels of one or more 

polynucleotides of the invention and comparing total levels of another sequence known to vary in 
cancerous tissue, e.g., expression of p53, DCC ras, lor FAP (see, e.g., Fearon ER, etaL, Cell (1990) 
6I(5):759; Hamilton SR et al. y Cancer (1993) 72:957; Bodmer W, et al. y Nat Genet. (1994) 4(3):2 17; 
Fearon ER, Ann N YAcadSci. (1995) 768: 101). For example, development of colon cancer can be 

30 detected by examining the ratio of any of the polynucleotides of the invention to the levels of oncogenes 
(e.g. ras) or tumor suppressor genes (e.g. FAP or p53). Thus expression of specific marker 
polynucleotides can be used to discriminate between normal and cancerous colon tissue, to discriminate 
between colon cancers with different cells of origin, to discriminate between colon cancers with different 
potential metastatic rates, etc. 
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Detection of prostate cancer. The polynucleotides and their corresponding genes and gene 
products exhibiting the appropriate differential expression pattern can be used to detect prostate cancer 
in a subject. Over 95% of primary prostate cancers are adenocarcinomas. Signs and symptoms may 
include: frequent urination, especially at night, inability to urinate, trouble starting or holding back 
5 urination, a weak or interrupted urine flow and frequent pain or stiffness in the lower back, hips or upper 
thighs. 

Many of the signs and symptoms of prostate cancer can be caused by a variety of other non- 
cancerous conditions. For example, one common cause of many of these signs and symptoms is a 
condition called benign prostatic hypertrophy, or BPH. In BPH, the prostate gets bigger and may block 
10 the flow or urine or interfere with sexual function. The methods and compositions of the invention can 
be used to distinguish between prostate cancer and such non-cancerous conditions. The methods of the 
invention can be used in conjunction with conventional methods of diagnosis, e.g., digital rectal exam 
and/or detection of the level of prostate specific antigen (PSA), a substance produced and secreted by the 
prostate. 

15 Use of Polynucleotides to Screen for Peptide Analogs and Antagonists 

Polypeptides encoded by the instant polynucleotides and corresponding full length genes can be 
used to screen peptide libraries to identify binding partners, such as receptors, from among the encoded 
polypeptides. Peptide libraries can be synthesized according to methods known in the art (see, e.g., 
USPN 5,010,175 , and WO 91/17823). Agonists or antagonists of the polypeptides if the invention can 

20 be screened using any available method known in the art, such as signal transduction, antibody binding, 
receptor binding, mitogenic assays, chemotaxis assays, etc. The assay conditions ideally should 
resemble the conditions under which the native activity is exhibited in vivo, that is, under physiologic 
pH, temperature, and ionic strength. Suitable agonists or antagonists will exhibit strong inhibition or 
enhancement of the native activity at concentrations that do not cause toxic side effects in the subject. 

25 Agonists or antagonists that compete for binding to the native polypeptide can require concentrations 
equal to or greater than the native concentration, while inhibitors capable of binding irreversibly to the 
polypeptide can be added in concentrations on the order of the native concentration. 

Such screening and experimentation can lead to identification of a novel polypeptide binding 
partner, such as a receptor, encoded by a gene or a cDNA corresponding to a polynucleotide of the 

30 invention, and at least one peptide agonist or antagonist of the novel binding partner. Such agonists and 
antagonists can be used to modulate, enhance, or inhibit receptor function in cells to which the receptor is 
native, or in cells that possess the receptor as a result of genetic engineering. Further, if the novel 
receptor shares biologically important characteristics with a known receptor, information about 
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agonist/antagonist binding can facilitate development of improved agonists/antagonists of the known 
receptor. 

Pharmaceutical Compositions and Therapeutic Uses 

Pharmaceutical compositions of the invention can comprise polypeptides, antibodies, or 
5 polynucleotides (including antisense nucleotides and ribozymes) of the claimed invention in a 

therapeutically effective amount. The term "therapeutically effective amount" as used herein refers to an 
amount of a therapeutic agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit 
a detectable therapeutic or preventative effect. The effect can be detected by, for example, chemical 
markers or antigen levels. Therapeutic effects also include reduction in physical symptoms, such as 

10 decreased body temperature. The precise effective amount for a subject will depend upon the subject's 
size and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount in 
advance. However, the effective amount for a given situation is determined by routine experimentation 
and is within the judgment of the clinician. For purposes of the present invention, an effective dose will 

15 generally be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA 
constructs in the individual to which it is administered. 

A pharmaceutical composition can also contain a pharmaceutical^ acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such as 
antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any pharmaceutical 

20 carrier that does not itself induce the production of antibodies harmful to the individual receiving the 
composition, and which can be administered without undue toxicity. Suitable carriers can be large, 
slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic 
acids, polymeric amino acids, amino acid copolymers, and inactive virus particles. Such carriers are well 
known to those of ordinary skill in the art. Pharmaceutically acceptable carriers in therapeutic 

25 compositions can include liquids such as water, saline, glycerol and ethanol. Auxiliary substances, such 
as wetting or emulsifying agents, pH buffering substances, and the like, can also be present in such 
vehicles. Typically, the therapeutic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection can 
also be prepared. Liposomes are included within the definition of a pharmaceutically acceptable carrier. 

30 Pharmaceutically acceptable salts can also be present in the pharmaceutical composition, e.g., mineral 
acid salts such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of 
organic acids such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington 's Pharmaceutical Sciences (Mack 
Pub. Co., N J. 1991). 
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Delivery Methods. Once formulated, the compositions of the invention can be (1) administered 
directly to the subject (e.g., as polynucleotide or polypeptides); or (2) delivered ex vivo, to cells derived 
from the subject (e.g. , as in ex vivo gene therapy). Direct delivery of the compositions will generally be 
accomplished by parenteral injection, e.g., subcutaneously, intraperitoneally, intravenously or 
5 intramuscularly, intratumoral or to the interstitial space of a tissue. Other modes of administration 
include oral and pulmonary administration, suppositories, and transdermal applications, needles, and 
gene guns or hyposprays. Dosage treatment can be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are 
known in the art and described in e.g., International Publication No. WO 93/14778. Examples of cells 

10 useful in ex vivo applications include, for example, stem cells, particularly hematopoetic, lymph cells, 
macrophages, dendritic cells, or tumor cells. Generally, delivery of nucleic acids for both ex vivo and in 
vitro applications can be accomplished by, for example, dextran-mediated transfection, calcium 
phosphate precipitation, polybrene mediated transfection, protoplast fusion, electroporation, 
encapsulation of the polynucleotide^) in liposomes, and direct microinjection of the DNA into nuclei, all 

15 well known in the art. 

Once a gene corresponding to a polynucleotide of the invention has been found to correlate with 
a proliferative disorder, such as neoplasia, dysplasia, and hyperplasia, the disorder can be amenable to 
treatment by administration of a therapeutic agent based on the provided polynucleotide, corresponding 
polypeptide or other corresponding molecule (e.g., antisense, ribozyme, etc.). 

20 The dose and the means of administration of the inventive pharmaceutical compositions are 

determined based on the specific qualities of the therapeutic composition, the condition, age, and weight 
of the patient, the progression of the disease, and other relevant factors. For example, administration of 
polynucleotide therapeutic compositions agents of the invention includes local or systemic 
administration, including injection, oral administration, particle gun or catheterized administration, and 

25 topical administration. Preferably, the therapeutic polynucleotide composition contains an expression 
construct comprising a promoter operably linked to a polynucleotide of at least 12, 22, 25, 30, or 35 
contiguous nt of the polynucleotide disclosed herein. Various methods can be used to administer the 
therapeutic composition directly to a specific site in the body. For example, a small metastatic lesion is 
located and the therapeutic composition injected several times in several different locations within the 

30 body of tumor. Alternatively, arteries which serve a tumor are identified, and the therapeutic 

composition injected into such an artery, in order to deliver the composition directly into the tumor. A 
tumor that has a necrotic center is aspirated and the composition injected directly into the now empty 
center of the tumor. The antisense composition is directly administered to the surface of the tumor, for 
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example, by topical application of the composition. X-ray imaging is used to assist in certain of the 
above delivery methods. 

Receptor-mediated targeted delivery of therapeutic compositions containing an antisense 
polynucleotide, subgenomic polynucleotides, or antibodies to specific tissues can also be used. 
5 Receptor-mediated DNA delivery techniques are described in, for example, Findeis et al. 9 Trends 
Biotechnol (1993) 77:202; Chiou etal. y Gene Therapeutics: Methods And Applications Of Direct 
Gene Transfer (J.A. Wolff, ed.) (1994); Wu etal 9 J. Biol. Chem. (1988) 265:621; Wu et al, J- Biol 
Chem. (1994) 269:542; Zenke etal.Proc. Natl Acad Sci. (USA) (1990) $7:3655; Wu et al,J. Biol 
Chem. (1991) 266:338. Therapeutic compositions containing a polynucleotide are administered in a 

10 range of about 100 ng to about 200 mg of DNA for local administration in a gene therapy protocol. 
Concentration ranges of about 500 ng to about 50 mg, about 1 ng to about 2 mg, about 5 ng to about 
500 ng, and about 20 ng to about 100 ng of DNA can also be used during a gene therapy protocol 
Factors such as method of action (e.g., for enhancing or inhibiting levels of the encoded gene product) 
and efficacy of transformation and expression are considerations which will affect the dosage required 

15 for ultimate efficacy of the antisense subgenomic polynucleotides. Where greater expression is desired 
over a largo- area of tissue, larger amounts of antisense subgenomic polynucleotides or the same 
amounts readministered in a successive protocol of administrations, or several administrations to 
different adjacent or close tissue portions of, for example, a tumor site, may be required to effect a 
positive therapeutic outcome. In all cases, routine experimentation in clinical trials will determine 

20 specific ranges for optimal therapeutic effect. For polynucleotide related genes encoding polypeptides or 
proteins with anti-inflammatory activity, suitable use, doses, and administration are described in USPN 
5,654,173. 

The therapeutic polynucleotides and polypeptides of the present invention can be delivered using 
gene delivery vehicles. The gene delivery vehicle can be of viral or non-viral origin (see generally, Jolly, 

25 Cancer Gene Therapy (1994) 7:51; Kimura, Human Gene Therapy (1994) 5:845; Connelly, Human 
Gene Therapy (1995) 7:185; and Kaplitt,Nature Genetics (1994) 6:148). Expression of such coding 
sequences can be induced using endogenous mammalian or heterologous promoters. Expression of the 
coding sequence can be either constitutive or regulated. 

Viral-based vectors for delivery of a desired polynucleotide and expression ir\ a desired cell are 

30 well known in the art Exemplary viral-based vehicles include, but are not limited to, recombinant 
retroviruses (see, e.g., WO 90/07936; WO 94/03622; WO 93/25698; WO 93/25234; USPN 5, 
219,740; WO 93/1 1230; WO 93/10218; USPN 4,777,127; GB Patent No. 2,200,65 1; EP 0 345 242; 
and WO 91/02805), alphavirus-based vectors (e.g., Sindbis virus vectors, Semliki forest virus (ATCC 
VR-67; ATCC VR-1247), Ross River virus (ATCC VR-373; ATCC VR-1246) and Venezuelan equine 
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encephalitis vims (ATCC VR-923; ATCC VR-1250; ATCC VR 1249; ATCC VR-532), and adeno- 
associated virus (AAV) vectors (see, e.g., WO 94/12649, WO 93/03769; WO 93/19191; WO 
94/28938; WO 95/1 1984 and WO 95/00655). Administration of DNA linked to killed adenovirus as 
described in Curiel, Hum. Gene Ther. (1992) 3: 147 can also be employed. 
5 Non-viral delivery vehicles and methods can also be employed, including, but not limited to, 

polycationic condensed DNA linked or unlinked to killed adenovirus alone (see, e.g., Curiel, Hum. Gene 
Ther, (1992) 5:147); ligand-linked DNA(see, e.g., Wu, J. Biol. Chem, (1989) 264:16985); eukaryotic 
cell delivery vehicles cells (see, e.g., USPN 5,814,482; WO 95/07994; WO 96/17072; WO 95/30763; 
and WO 97/42338) and nucleic charge neutralization or fusion with cell membranes. Naked DNA can 

10 also be employed. Exemplary naked DNA introduction methods are described in WO 90/1 1092 and 
USPN 5,580,859. Liposomes that can act as gene delivery vehicles are described in USPN 5,422,120; 
WO 95/13796; WO 94/23697; WO 91/14445; and EP 0524968. Additional approaches are described 
inPhilip,M?/. CellBiol (1994) 74:2411, and in Woffendin, Proc. NatlAcad, Sci, (1994)97:1581 
Further non-viral delivery suitable for use includes mechanical delivery systems such as the 

15 approach described in Woffendin et al^ Proc. Natl, Acad, Sci. USA (1994) 97(24): 1 1581. Moreover, 
the coding sequence and the product of expression of such can be delivered through deposition of 
photopolymerized hydrogel materials or use of ionizing radiation (see, e.g., USPN 5,206, 1 52 and WO 
92/1 1033). Other conventional methods for gene delivery that can be used for delivery of the coding 
sequence include, for example, use of hand-held gene transfer particle gun (see, e.g., USPN 5,149,655); 

20 use of ionizing radiation for activating transferred gene (see, e.g., USPN 5,206,152 and WO 92/1 1033). 

The present invention will now be illustrated by reference to the following examples which set 
forth particularly advantageous embodiments. However, it should be noted that these embodiments are 
illustrative and are not to be construed as restricting the invention in any way. 

25 EXAMPLES 

The following examples are offered primarily for purposes of illustration. It will be readily 
apparent to those skilled in the art that the formulations, dosages, methods of administration, and other 
parameters of this invention may be further modified or substituted in various ways without departing 
from the spirit and scope of the invention. 
30 Example 1: Source of Biological Materials and Overview of Novel Polynucleotides Expressed by 
the Biological Materials 

cDNA libraries were constructed from mRNA isolated from the cell lines indicated in Table 4. 
The specific library from which any polynucleotide was isolated is indicated in Table 1 , with the number 
of the entry under the "LIBRARY" column correlating to the library number in Table 4 . 
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Polynucleotides expressed by the selected cell lines were isolated and analyzed; the sequences of these 
polynucleotides were about 275-300 nucleotides in length. 

The sequences of the isolated polynucleotides were first masked to eliminate low complexity 
sequences using the XBLAST masking program (Claverie "Effective Large-Scale Sequence Similarity 
5 Searches," In: Computer Methods for Macromolecular Sequence Analysis. Doolittle, ed., Meth. 

Enzymol 266:212-227 Academic Press, NY, NY (1996); see particularly Claverie, in "Automated DNA 
Sequencing and Analysis Techniques" Adams et ai, eds., Chap. 36, p. 267 Academic Press, San Diego, 
1994 and Claverie et al Comput. Chem. (1993) 17: 191 ). Generally, masking does not influence the 
final search results, except to eliminate sequences of relative little interest due to their low complexity, 
10 and to eliminate multiple "hits' 5 based on similarity to repetitive regions common to multiple sequences, 
e.g., Alu repeats. The remaining sequences were then used in a BLASTN vs. GenBank search; 

-40 

sequences that exhibited greater than 70% overlap, 99% identity, and a p value of less than 1 x 10 

were discarded Sequences from this search also were discarded if the inclusive parameters were met, 
but the sequence was ribosomal or vector-derived. 
15 The resulting sequences from the previous search were classified into three groups (1, 2 and 3 

below) and searched in a BLASTX vs. NRP (non-redundant proteins) database search: (1) unknown (no 
hits in the GenBank search), (2) weak similarity (greater than 45% identity and p value of less than 1 x 

10 5 ), and (3) high similarity (greater than 60% overlap, greater than 80% identity, and p value less than 

1 x 10" 5 ). Sequences having greater than 70% overlap, greater than 99% identity, and p value of less 
-40 

20 than 1x10 were discarded. 

The remaining sequences were classified as unknown (no hits), weak similarity, and high 
similarity (parameters as above). Two searches were performed on these sequences. First, a BLAST vs. 
EST database search was performed and sequences with greater than 99% overlap, greater than 99% 

-40 

similarity and a p value of less than 1x10 were discarded. Sequences with a p value of less than 1 x 

25 10 65 when compared to a database sequence of human origin were also excluded. Second, a BLASTN 

vs. Patent GeneSeq database was performed and sequences having greater than 99% identity, p value 
-40 

less than 1 x 10 , and greater than 99% overlap were discarded. 

The remaining sequences were subjected to screening using other rules and redundancies in the 
dataset. Sequences with a p value of less than 1 x 10 ^ ^ in relation to a database sequence of human 

30 origin were specifically excluded. The final result provided the 2396 sequences listed as SEQ ID 

NOS: 1-2396 in the accompanying Sequence Listing and summarized in Table 1 (inserted prior to 
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claims). Each identified polynucleotide represents sequence from at least a partial mRNA transcript. 

Table 1 provides: 1) the SEQ ID NO assigned to each sequence for use in the present 
specification; 2) the cluster to which the sequence is assigned; 3) the sequence name used as an internal 
identifier of the sequence; 4) the orientation of the insert in the clone (F=forward; R=reverse); 5) the 
5 name assigned to the clone from which the sequence was isolated; and 6) the library from which the 
sequence was originally isolated. Because the provided polynucleotides represent partial mRNA 
transcripts, two or more polynucleotides of the invention may represent different regions of the same 
mRNA transcript and the same gene. Thus, if two or more SEQ ID NOS: are identified as belonging to 
the same clone, then either sequence can be used to obtain the full-length mRNA or gene. 

10 

Example 2: Results of Public Database Search to Identify Function of Gene Products 

SEQ ID NOS: 1-2396 were translated in all three reading frames, and the nucleotide sequences 
and translated amino acid sequences used as query sequences to search for homologous sequences in 
either the GenBank (nucleotide sequences) or Non-Redundant Protein (amino acid sequences) databases. 
15 Query and individual sequences were aligned using the BLAST 2.0 programs (National Center for 
Biotechnology Information, Bethesda, Maryland; see also Altschul, et al. Nucleic Acids Res. 
(1997) 25:3389-3402). The sequences were masked to various extents to prevent searching of repetitive 
sequences or poly-A sequences, using the XBLAST program for masking low complexity as described 
above in Example 1. 

20 Tables 2A and 2B (inserted before the claims) provide the alignment summaries having a p 

-2 

value of I x 10 or less indicating substantial homology between the sequences of the presort invention 

and those of the indicated public databases. Table 2A provides the SEQ ID NO of the query sequence, 
the accession number of the GenBank database entry of the homologous sequence, and the p value of the 
alignment Table 2B provides the SBQJDD.MQ.of the query sequence, the accession number of the Non- . 

25 Redundant Protein database entry of the homologous sequence, and the p value of the alignment. The 
alignments provided in Tables 2 A and 2B are the best available alignment to a DNA or amino acid 
sequence at a time just prior to filing of the present specification. The activity of the polypeptide 
encoded by the SEQ ID NOS listed in Tables 2A and 2B can be extrapolated to be substantially the 
same or substantially similar to the activity of the reported nearest neighbor or closely related sequence. 

30 The accession number of the nearest neighbor is reported, providing a publicly available reference to the 
activities and functions exhibited by the nearest neighbor. The public information regarding the 
activities and functions of each of the nearest neighbor sequences is incorporated by reference in this 
application. Also incorporated by reference is all publicly available information regarding the sequence, 
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as well as the putative and actual activities and functions of the nearest neighbor sequences listed in 
Table 2B and their related sequences. The search program and database used for the alignment, as well 
as the calculation of the p value are also indicated 

Full length sequences or fragments of the polynucleotide sequences of the nearest neighbors can 
5 be used as probes and primers to identify and isolate the full length sequence of the corresponding 

polynucleotide. The nearest neighbors can indicate a tissue or cell type to be used to construct a library 
for the full-length sequences of the corresponding polynucleotides. 

Example 3: Members of Protein Families 
10 SEQ ID NOS: 1-2396 were used to conduct a profile search as described in the specification 

above. Several of the polynucleotides of the invention were found to encode polypeptides having 
characteristics of a polypeptide belonging to a known protein family (and thus represent nmembers of 
these protein families) and/or comprising a known functional domain. Table provides the SEQ ID NO: 
of the query sequence, the profile name, and a brief description of the profile hit. 



Table 3 


SEQ ID 


Profilename 


Description 


410 


ATPases 


ATPases Associated with Various Cellular Activities 


■537 


ATPases 


ATPases Associated with Various Cellular Activities 


'539 


ATPases 


ATPases Associated with Various Cellular Activities 


1540 


ATPases 


ATPases Associated with Various Cellular Activities 


1662 

| 


Rrm 


RNA recognition motif, (aka RRM, RBD, or RNP 
domain) 


1683 


Rrm 


RNA recognition motif, (aka RRM, RBD, or RNP 
domain) 


|707 


dualspecphosphatase 


pual specificity phosphatase, catalytic domain 


i 


Rrm 


IRNA recognition motif, (aka RJRM, RBD, or RNP 
jdomain) 


1719 


Efhand 


|EF-hand 


|738 lATPases 


| ATPases Associated with Various Cellular Activities 


|779 |Zincfmg_C2H2 


jZinc finger, C2H2 type 


{781 


Rrm 


|RNA recognition motif, (aka RRM, RBD, or RNP 
jdomain) 


1783 


Rrm 


jRNA recognition motif, (aka RRM, RBD, or RNP 
|domain) 


11110 


WDjlomain 


jWD domain, G-beta repeats 


[1415 


Dead box helic 


jDEAD and DEAH box helicases 


11533 


:C2 


|C2 domain (prot. kinase C like) 


[1633 


dualspecphosphatase 


iDual specificity phosphatase, catalytic domain 


11637 


Deadjx>x_helic 


jDEAD and DEAH box helicases 
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Table 3 


SEQID 


Profilename 


Description 


1638 


Dead_box_helic 




DEAD and DEAH box helicases 


1744 


WD_domain 


jWD domain, G-beta repeats 


; 1759 


BZIP 


jBasic region plus leucine zipper transcription factors 


11993 


WD_domain 


!WD domain, G-beta repeats 


12083" 


WD_domain 


jWD domain, G-beta repeats 


;2209 jATPases 


jATPases Associated with Various Cellular Activities 


12228 


Ras 


IRas family 


|2287 IRas 


IRas family 


|2300 


neur_chan 


INeurotransmitter-gated ion-channel 


12302 


tor_domain2 


jkinase domain of tors (Christoph Reinhard) 


12306 Ihomeobox 
12318 p'etallotiidon 


jHomeobox Domain 
iMetallothioneins 


|2327 


Asp 


jEukaryotic aspartyl proteases 



Some polynucleotides exhibited multiple profile hits where the query sequence contains 
overlapping profile regions, and/or where the sequence contains two different functional domains. Each 
of the profile hits of Table 3 are described in more detail below. The acronyms for the profiles (provided 
5 in parentheses) are those used to identify the profile in the Pfam and Prosite databases. The Pfam 
database can be accessed through web sites supported by the Washington University, St Louis 
(Missouri), The Sanger Centre (United Kingdom); and The Karolinska Institute Center for Genomics 
Research. The Prosite database is publicaly available through the ExPASy Molecular Biology Server, 
The public information available on the Pfam and Prosite databases regarding the various profiles, 

10 including but not limited to the activities, function, and consensus sequences of various proteins families 
and protein domains, is incorporated herein by reference. 

Eukarvotic Aspartyl Proteases (asp; Pfam Accession No. PF00026). SEQIDNO:2327 
corresponds to a gene encoding a novel eukaryotic aspartyl protease. Aspartyl proteases, known as acid 
proteases, (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes (Foltmann B., Essays 

15 Biochem. (1981) 1 7:52; Davies D1L, Annu. Rev. Biophys. Chem. (1990) 79:189; Rao J.K.M, et aL, 
Biochemistry (1991) 50:4663) known to exist in vertebrates, fungi, plants, retroviruses and some plant 
viruses. Aspartate proteases of eukaryotes are monomeric enzymes which consist of two domains. Each 
domain contains an active site centered on a catalytic aspartyl residue. The consensus pattern to identify 
eukaryotic aspartyl protease is: [LIVMFGAC]4LrVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]- 

20 [STAPDENQ]- x-[LIVMFSTNC]-x-[LrVMFGTA], where D is the active site residue. 
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ATPases Associated with Various Cellular Activities (ATPases: Pfam Accession No. PF0004\ 
SEQ ID NOS:410, 537, 539, 540, 738, and 2209 correspond to a sequence that encodes a member of a 
family of ATPases Associated with diverse cellular Activities (AAA). The AAA protein family is 
composed of a large number of ATPases that share a conserved region of about 220 amino acids 
5 containing an ATP-binding site (Froehlich et al, J. Cell Biol (1991) 774:443; Erdmann et al Cell 
(1991) 6*499; Peters etal, EMBOJ. (1990) 9:1757; Kunau et al, Biochimie (1993) 75:209-224; 
Confalonieri et al, BioEssays (1995) 1 7:639; see also the AAA Server Homepage). The AAA domain, 
which can be present in one or two copies, acts as an ATP-dependent protein clamp (Confalonieri et al 
(1995) BioEssays 1 7:639) and contains a highly conserved region located in the central part of the 

10 domain. The consensus pattern is: [LIVMT]-x-[LIVMT]-[L^ 
[LIVM]- D-x-A-|LIFA]-x-R. 

Basic Region Plus Leucine Zipper Transcription Factors (BZIP: Pfam Accession No. PF00170V 
SEQ ID NO: 1759 represents a polynucleotide encoding a novel member of the family of basic region 
plus leucine zipper transcription factors. The bZIP superfamily (Hurst, Protein Prof. (1995) 2: 105; and 

15 Ellenberger, Curr. Opin. Struct Biol (1994) 4: 12) of eukaryotic DNA-binding transcription factors 

encompasses proteins that contain a basic region mediating sequence-specific DNA-binding followed by 
a leucine zipper required for dimerization. The consensus pattern for this protein family is: [KR]-x(l,3)- 
[RKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAEN(3-x-R-x-[RK]. 

C2 domain (C2: Pfam Accession No. PF00168). SEQ ID NO: 1533 corresponds to a sequence 

20 encoding a C2 domain, which is involved in calcium-dependent phospholipid binding (Davletov J. Biol 
Chem. (1993) 268:26386-26390) or, in proteins that do not bind calcium, the domain may facilitate 
binding to inositol-l,3,4,5-tetraphosphate (Fukuda et al. J. Biol Chem. (1994) 269:29206-29211; 
Sutton et al. Cell (1995) 50:929-938). The consensus sequence is: [ACG]-x(2)-L-x(2,3)-D-x(l,2)- 
[NGSTLDF]-[GTMR]-x-[STAP]-D- [PA]-[FY]. 

25 DEAD and DEAH box families ATP-dependent helicases (Dead box helic: Pfam Accession 

No. PF00270V SEQIDNOS:1415, 1637, and 1638 represent polynucleotides encoding a novel 
member of the DEAD and DEAH box families (Schmid et aL, Mol Microbiol (1992) 6:283; Linder et 
dl. 9 Nature (1989) 357:121; Wassarman, et al.,Nature (1991) 349:463). All members of these families 
are involved in ATP-dependent, nucleic-acid unwinding. All DEAD box family members share a 

3 0 number of conserved sequence motifs, some of which are specific to the DEAD family, with others 

shared by other ATP-binding proteins or by proteins belonging to the helicases 'superfamily' (Hodgman 
Nature (1988) 333:22 andNature (1988) 333:578 (Errata)). One of these motifs, called the 'D-E-A-D- 
box 1 , represents a special version of the B motif of ATP-binding proteins. Proteins that have His instead 
of the second Asp and are T)-E-A-H-box f proteins (Wassarman et d.,Nature (1991) 349:463; Harosh, 
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et ^Nucleic Acids Res. (1991) 79:6331; Koonin , et al., J. Gen. Virol (1992) 73:989). The following 
signature patterns are used to identify member for both subfamilies: 1) [LIVMF] (2)-I>E- A-D- [RKEN] - 
x- [LIVMF YGS TN] ; and 2) [GSAH]-x-[LIVMF](3)^D-E.[ALIV]-H-[NECR]. 

Dual specificity phosphatase (DSPc: Pfam Accession No. PF00782). SEQ ID NOS J07 and 
5 1633 correspond to sequences that encode members of a family of dual specificity phosphatases (DSPs). 
DSPs are Ser/Thr and Tyr protein phosphatases that comprise a tertiary fold highly similar to that of 
tyrosine-specific phosphatases, except for a "recognition" region connecting helix alpha 1 to strand 
betal. This tertiary fold may determine differences in substrate specific between VH-1 related dual 
specificity phosphatase (VHR), the protein tyrosine phosphatases (PTPs), and other DSPs. 
10 Phosphatases are important in the control of cell growth, proliferation, differentiation and 
transformation. 

EF Hand (Efhand: Pfam Accession No. PF00036). SEQ ID NO:719 corresponds to a 
polynucleotide encoding a member of the EF-hand protein family, a calcium binding domain shared by 
many calcium-binding proteins belonging to the same evolutionary family (Kawasaki et al 9 Protein, 

15 Prof, (1995) 2:305-490), The domain is a twelve residue loop flanked on both sides by a twelve residue 
alpha-helical domain, with a calcium ion coordinated in a pentagonal bipyramidal configuration. The six 
residues involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, Y, 
Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca 
(bidentate ligand). The consensus pattern includes the complete EF-hand loop as well as the first residue 

20 which follows the loop and which seem to always be hydrophobic: D-x-[DNS]-{ILVFYW}- 
[DENSTG]-pNQGHRKHGP}-[UVM 

Homeobox domain (homeobox; Pfam Accession No. PF00Q46Y SEQ ID NO:2306 represents a 
polynucleotide encoding a protein having a homeobox domain. The 'homeobox' is a protein domain of 
60 amino acids (Gehring In: Guidebook to the Homebox Genes, Duboule D., Ed., ppl-10, Oxford 

25 University Press, Oxford, (1994); Buerglin In: Guidebook to the Homebox Genes. pp25-72, Oxford 
University Press, Oxford, (1994); Gehring Trends Biochem. Sci. (1992) 1 7:277-280; Gehring et 
alAnnu. Rev. Genet. (1986) 20:147-173; Schofield Trends Neuroscl (1987) 70:3-6) first identified in 
number of Drosophila homeotic and segmentation proteins. It is extremely well conserved in many other 
animals, including vertebrates. This domain binds DNA through a helix-turn-helix type of structure. 

30 Several proteins that contain a homeobox domain play an important role in development. Most of these 
proteins are sequence-specific DNA-binding transcription factors. The homeobox domain is also very 
similar to a region of the yeast mating type proteins. These are sequence-specific DNA-binding proteins 
that act as master switches in yeast differentiation by controlling gene expression in a cell type-specific 
fashion. 
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A schematic representation of the homeobox domain is shown below. The helix-turn-helix 
region is shown by the symbols H' (for helix), and V (for turn). 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHH^ 

1 60 
5 The pattern detects homeobox sequences 24 residues long and spans positions 34 to 57 of the homeobox 
domain. The consensus pattern is as follows: [LIVMFYGHASLW]-x^ 
x(4)-[LIV]-[RKNQESTAIY]-[LIWSTNKH]-W^ 

Metallothioneins (metalthio; Pfam Accession No. PF00131) . SEQ ID NO:23 18 corresponds to 
a polynucleotide encoding a member of the metallothionein (MT) protein family (Hamer Annu. Rev. 

10 Biochem. (1986) 55:913-95 1; and Kagi et al Biochemistry (1988) 27:8509-85 15), small proteins 

which bind heavy metals such as zinc, copper, cadmium, nickel, etc., through clusters of thiolate bonds. 
MPs occur throughout the animal kingdom and are also found in higher plants, fungi and some 
prokaryotes. On the basis of structural relationships MT's have been subdivided into three classes. 
Class I includes mammalian MT's as well as MT's from crustacean and molluscs, but with clearly related 

15 primary structure. Class II groups together MT's from various species such as sea urchins, fungi, insects 
and cyanobacteria which display none or only very distant correspondence to class I MT's. Class HI 
MT's are atypical polypeptides containing gamma-glutamylcysteinyl units. The consensus pattern for 
this protein family is: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K. 

Neurotransmitter-Gated Ion-Channel fneur chan: Pfam Accession No. PF00065V SEQ ID 

20 NO:2300 corresponds to a sequence encoding a neurotransmitter-gated ion channel. Neurotransmitter- 
gated ion-channels, which provide the molecular basis for rapid signal transmission at chemical 
synapses, are post-synaptic oligomeric transmembrane complexes that transiently form a ionic channel 
upon the binding of a specific neurotransmitter. Five types of neurotransmitter-gated receptors are 
known: 1) nicotinic acetylcholine receptor (AchR); 2) glycine receptor; 3) gamma-arainobutyric-acid 

25 (GABA) receptor; 4) serotonin 5HT3 receptor; and 5) glutamate receptor. All known sequences of 

subunits from neurotransmitter-gated ion-channels are structurally related, and are composed of a large 
extracellular glycosylated N-terminal ligand-binding domain, followed by three hydrophobic 
transmembrane regions that form the ionic channel, followed by an intracellular region of variable 
length. A fourth hydrophobic region is found at the C-terminal of the sequence. The consensus pattern 

30 is: C-x-[LIVMFQ]-x-[LIVMF]-x(2)-[FY]-P-x-I>x(3)-C J where the two C's are linked by a disulfide 
bond. 

Ras family proteins (ras: Pfam Accession No. PF0007iy SEQIDNOS:2228 and 2287 
represent polynucleotides encoding the ras family of small GTP/GDP-binding proteins (Valencia et al., 
1991, Biochemistry 30:4637-4648). Ras family members generally require a specific guanine nucleotide 
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exchange factor (GEF) and a specific GTPase activating protein (GAP) as stimulators of overall GTPase 
activity. Among ras-related proteins, the highest degree of sequence conservation is found in four 
regions that are directly involved in guanine nucleotide binding. The first two constitute most of the 
phosphate and Mg2+ binding site (PM site) and are located in the first half of the G-domain. The other 
5 two regions are involved in guanosine binding and are located in the C-terminal half of the molecule. 
Motifs and conserved structural features of the ras-related proteins are described in Valencia et al. 5 1991, 
Biochemistry 30:4637-4648. A major consensus pattern of ras proteins is: D-T-A-G-Q-E-K-[LF]-G- 
G-L-R-[DE]-G-Y-Y. 

RNA Recognition Motif farm: Pfam Accession No. PF00076\ SEQIDNOS:662, 683, 708, 

10 78 1, and 783 correspond to sequence encoding an RNA recognition motif, also known as an REM, 
RBD, or RNP domain. This domain, which is about 90 amino acids long, is contained in eukaryotic 
proteins that bind single-stranded RNA (Bandziulis et el. . Genes Dev. (1989) 3:431-437; Dreyfuss et al. 
Trends Biochem. Set (1988) 73:86-91). Two regions within the RNA-binding domain are highly 
conserved: the first is a hydrophobic segment of six residues (which is called the RNP-2 motif), the 

15 second is an octapeptide motif (which is called RNP- 1 or RNP-CS). The consensus pattern is: [RK]-G- 
{EDRKHPCG}-[AGSCI]-[FY]-[LIVA]-x-[FYLM]. 

Kinase Domain of Tors ( tor domain2y SEQ ID NO:2302 corresponds to a member of the TOR 
lipid kinase protein family. This family is composed of large proteins with a lipid and protein kinase 
domain and characterized through their sensitivity to rapamycin (an antifungal compound). TOR 

20 proteins are involved in signal transduction downstream of PD kinase and many other signals. TOR 
(also called FRAP, RAFT) plays a role in regulating protein synthesis and cell growth., and in yeast 
controls translation initiation and early Gl progression. See, e.g., Barbet et al Mol Biol Cell. (1996) 
7(l):25-42; Helliwell et al Genetics (1998) 7^:99-112. 

WD Domain, G-Beta Repeats (WD domain: Pfam Accession No. PF00400V SEQ ID 

25 NOS: 1110, 1744, 1993, and 2083 represent novel members of the WD domain/G-beta repeat family. 
Beta-transducin (G-beta) is one of the three subunits (alpha, beta, and gamma) of the guanine 
nucleotide-binding proteins (G proteins) which act as intermediaries in the transduction of signals 
generated by transmembrane receptors (Gi]man,Annu. Rev. Biochem. (1987) 55:615). The alpha 
subunit binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but 

30 they seem to be required for the replacement of GDP by GTP as well as for membrane anchoring and 

receptor recognition. In higher eukaryotes, G-beta exists as a small multigene family of highly conserved 
proteins of about 340 amino acid residues. Structurally, G-beta consists of eight tandem repeats of 
about 40 residues, each containing a central Trp-Asp motif (this type of repeat is sometimes called a 
WD-40 repeat). The consensus pattern for the WD domain/G-Beta repeat family is: [LIVMSTAC]- 
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[LIVMFYWSTAGC]-[LMS 
[LIVMFSTAG]-W-[DEN]-[LWMFSTAGCN]. 

Zinc Finger. C2H2 Type (Zincfing C2H2: Pfam Accession No. PF00096Y SEQIDNO:779 
corresponds to a polynucleotid encoding a member of the C2H2 type zinc finger protein family, which 
5 contain zinc finger domains that facilitate nucleic acid binding (Klug et al. 9 Trends Biochem. Sci. (1987) 
72:464; Evans et al. 9 Cell (1988) 52: 1; Payre et aL, FEBSLett (1988) 234:245; Miller et al.,EMBOJ. 
(1985) 4: 1609; and Berg, Proc. Natl. Acad Sci. USA (1988) 55:99). 

In addition to the conserved zinc ligand residues, a number of other positions are also important 
for the structural integrity of the C2H2 zinc fingers. (Rosenfeld et aL, J. Biomol Struct Dyn. (1993) 
10 11:557) The best conserved position, vriiich is generally an aromatic or aliphatic residue, is located four 
residues after the second cysteine. The consensus pattern for C2H2 zinc fingers is: C-x(2,4)-C-x(3)- 
[LIVMFYWC]-x(8)-H-x(3,5)-H. The two C's and two H's are zinc ligands. 

Example 4: Differential Expression of Polynucleotides of the Invention: Description of Libraries 
15 and Detection of Differential Expression 

The relative expression levels of the polynucleotides of the invention was assessed in several 
libraries prepared from various sources, including cell lines and patient tissue samples. Table 4 provides 
a summary of these libraries, including the shortened library name (used hereafter), the mRNA source 
used to prepared the cDNA library, and the approximate number of clones in the library. 

20 

Table 4. Description of cDNA Libraries 



Library 
(Lib#) 


Description 


Number of 
Clones in 
Library 


1 


Human Colon Cell Line Kml2L4: High Metastatic 
Potential (derived from Kml2C) 


308731 


2 


Human Colon Cell Line Kml2C: Low Metastatic Potential 


284771 


3 


Human Breast Cancer Cell Line MDA-MB-231: High 
Metastatic Potential; micro -mets in lung 


326937 


4 


Human Breast Cancer Cell Line MCF7: Non Metastatic 


318979 


8 


Human Lung Cancer Cell Line MV-522: High Metastatic 
Potential 


223620 


9 


Human Lung Cancer Cell Line UCP-3: Low Metastatic 
Potential 


312503 


12 


Human microvascular endothelial cells (HMVEC) - 
UNTREATED (PCR (OligodT) cDNA library) 


41938 


13 


Human microvascular endothelial cells (HMVEC) - bFGF 
TREATED (PCR (OligodT) cDNA library) 


42100 


14 


Human microvascular endothelial cells (HMVEC) - VEGF 
TREATED (PCR (OligodT) cDNA library) 


42825 



52 



WO 01/66753 PCT/US01/07787 



Library 
(Lib#) 


Description 


Number of 
Clones in 
Library 


15 


Normal Colon - UC#2 Patient (MICRODISSECTED PCR 
(OligodT) cDNA library) 


282722 


16 


Colon Tumor - UC#2 Patient (MICRODISSECTED PCR 
(OligodT) cDNA library) 


29883 1 


17 


Liver Metastasis from Colon Tumor of UC#2 Patient 
(MICRODISSECTED PCR (OligodT) cDNA library) 


303467 


18 


Normal Colon - UC#3 Patient (MICRODISSECTED PCR 
(OligodT) cDN A library) 


36216 


19 


Colon Tumor - UC#3 Patient (MICRODISSECTED PCR 
(OligodT) cDNA library) 


41388 


20 


Liver Metastasis from Colon Tumor of UC#3 Patient 
(MICRODISSECTED PCR (OligodT) cDNA library) 


30956 


21 


GRRpz Cells derived from normal prostate epithelium 


164801 


22 


WOca Cells derived from Gleason Grade 4 prostate cancer 
epithelium 


162088 


23 


Normal Lung Epithelium of Patient #1006 
(MICRODISSECTED PCR (OligodT) cDNA library) 


306198 


24 


Primary tumor, Large Cell Carcinoma of Patient #1006 
(MICRODISSECTED PCR (OligodT) cDNA library) 


309349 



The KM12L4 cell line (Morikawa, et al., Cancer Research (1988) 4S:6863)is derived from the 
KM12C cell line (Morikawa et aL Cancer Res. (1988) 45:1943-1948),. The KM12C cell line, which is 
poorly metastatic (low metastatic) was established in culture from a Dukes 5 stage B2 surgical specimen 

5 (Morikawa et al Cancer Res. (1988) 4S:6863). The KM12L4-A is a highly metastatic subline derived 
from KM12C (Yeatman et al Nucl Acids. Res. (1995) 23:4007; Bao-Ling et al. Proc. Annu. Meet. Am. 
Assoc. Cancer. Res. (1995) 27:3269). The KM12C and KM12C-derived cell lines (e.g, KM12L4, 
KM12L4-A, etc.) are well-recognized in the art as a model cell line for the study of colon cancer (see, 
e.g., Moriakawa et al, supra; Radinsky et al Clin. Cancer Res. (1995) 1: 19; Yeatman et al, (1995) 

10 supra; Yeatman et al Clin. Exp. Metastasis (1996) 14:246). The MDA-MB-231 cell line was 

originally isolated from pleural effusions (Cailleau, J. Natl Cancer. Inst. (1974) 53:661), is of high 
metastatic potential, and forms poorly differentiated adenocarcinoma grade II in nude mice consistent 
with breast carcinoma. The MCF7 cell line was derived from a pleural effusion of a breast 
adenocarcinoma and is non-metastatic. The MDA-MB-23 1 and MCF-7cell lines are well-recognized in 

15 the art as a models for the study of human breast cancer (see, e.g., Chandrasekaran et al , Cancer Res. 
(1979) 39:870; Gastpar et al,JMed Chem (1998) 47:4965; Ransone/ al, Br J Cancer (1998) 
77: 1586; and Kuang et al, Nucleic Acids Res (1998) 26: 1 1 16). 

The MV-522 cell line is derived from a human lung carcinoma and is of high metastatic 
potential. The UCP-3 cell line is a low metastatic human lung carcinoma cell line; the MV-522 is a high 
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metastatic variant of UCP-3. These cell lines are well-recognized in the art as models for the study of 
human lung cancer (see, e.g., Varki etaljnt J Cancer (1987) 40:46 (UCP-3); Varki et aL, Tumour 
Biol. (1990) 77:327; (MV-522 and UCP-3); Varki etaL, Anticancer Res. (1990) 70:637; (MV-522); 
Kelner et al, Anticancer Res (1995) 75:867 (MV-522); and Zhang etal, Anticancer Drugs (1997) 
5 S:696 (MV522)). The samples of libraries 15-20 are derived from two different patients (UC#2, and 
UC#3). The bFGF-treated HMVEC were prepared by incubation with bFGF at lOng/ml for 2 hrs; the 
VEGF-treated HMVEC were prepared by incubation with 20ng/ml VEGF for 2 hrs. Following 
incubation with the respective growth factor, the cells were washed and lysis buffer added for RNA 
preparation. The GRRpz and WOca cell lines were provided by Dr. Donna M. Peehl, Department of 

1 0 Medicine, Stanford Uni versity School of Medicine. GRRpz was derived from normal prostate 
epithelium. The WOca cell line is a Gleason Grade 4 cell line. 

Each of the libraries is composed of a collection of cDNA clones that in turn are representative 
of the mRNAs expressed in the indicated mRNA source. In order to facilitate the analysis of the millions 
of sequences in each library, the sequences were assigned to clusters. The concept of "cluster of clones" 

15 is derived from a sorting/grouping of cDNA clones based on their hybridization pattern to a panel of 
roughly 300 7bp oligonucleotide probes (see Drmanac etaL, Genomics (1996) 37(1):29). Random 
cDNA clones from a tissue library are hybridized at moderate stringency to 300 7bp oligonucleotides. 
Each oligonucleotide has some measure of specific hybridization to that specific clone. The combination 
of 300 of these measures of hybridization for 300 probes equals the "hybridization signature" for a 

20 specific clone. Clones with similar sequence will have similar hybridization signatures. By developing a 
sorting/grouping algorithm to analyze these signatures, groups of clones in a library can be identified and 
brought together computationally. These groups of clones are termed "clusters". Depending on the 
stringency of the selection in the algorithm (similar to the stringency of hybridization in a classic library 
cDNA screening protocol), the "purity" of each cluster can be controlled. For example, artifacts of 

25 clustering may occur in computational clustering just as artifacts can occur in "wet-lab" screening of a 
cDNA library with 400 bp cDNA fragments, at even the highest stringency. The stringency used in the 
implementation of cluster herein provides groups of clones that are in general from the same cDNA or 
closely related cDNAs. Closely related clones can be a result of different length clones of the same 
cDNA, closely related clones from highly related gene families, or splice variants of the same cDNA. 

30 Differential expression for a selected cluster was assessed by first deteraiining the number of 

cDNA clones corresponding to the selected cluster in the first library (Clones in 1 st ), and the determining 
the number of cDNA clones corresponding to the selected cluster in the second library (Clones in 2 nd ). 
Differential expression of the selected cluster in the first library relative to the second library is 
expressed as a "ratio" of percent expression between the two libraries. In general, the "ratio" is 
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calculated by: 1) calculating the percent expression of the selected cluster in the first library by dividing 
the number of clones corresponding to a selected cluster in the first library by the total number of clones 
analyzed from the first library; 2) calculating the percent expression of the selected cluster in the second 
library by dividing the number of clones corresponding to a selected cluster in a second library by the 
5 total number of clones analyzed from the second library; 3) dividing the calculated percent expression 
from the first library by the calculated percent expression from the second library. If the "number of 
clones" corresponding to a selected cluster in a library is zero, the value is set at 1 to aid in calculation. 
The formula used in calculating the ratio takes into account the "depth" of each of the libraries being 
compared, i.e., the total number of clones analyzed in each library. 

10 In general, a polynucleotide is said to be significandy differentially expressed between two 

samples when the ratio value is greater than at least about 2, preferably greater than at least about 3, 
more preferably greater than at least about 5, where the ratio value is calculated using the method 
described above. The significance of differential expression is determined using a z score test (Zar, 
Biostatistical Analysis. Prentice Hall, Inc., USA, 'Differences between Proportions," pp 296-298 

15 (1974). 

Example 5: Differential Expression of Genes Corresponding to Polynucleotides of the Invention 

A number of polynucleotide sequences have been identified that are differentially expressed 
between, for example, cells derived from high metastatic potential cancer tissue and low metastatic 

20 cancer cells, and between cells derived from metastatic cancer tissue and normal tissue. Evaluation of 
the levels of expression of the genes corresponding to these sequences can be valuable in diagnosis, 
prognosis, and/or treatment (e.g., to facilitate rationale design of therapy, monitoring during and after 
therapy, eta). Moreover, the genes corresponding to differentially expressed sequences described herein 
can be therapeutic targets due to their involvement in regulation (e.g., inhibition or promotion) of 

25 development of, for example, the metastatic phenotype. For example, sequences that correspond to 
genes that are increased in expression in high metastatic potential cells relative to normal or non- 
metastatic tumor cells may encode genes or regulatory sequences involved in processes such as 
angiogenesis, differentiation, cell replication, and metastasis. 

Detection of the relative expression levels of differentially expressed polynucleotides described 

30 herein can provide valuable information to guide the clinician in the choice of therapy. For example, a 
patient sample exhibiting an expression level of one or more of these polynucleotides that corresponds to 
a gene that is increased in expression in metastatic or high metastatic potential cells may warrant more 
aggressive treatment for the patient. In contrast, detection of expression levels of a polynucleotide 
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sequence that corresponds to expression levels associated with that of low metastatic potential cells may 
wan-ant a more positive prognosis than the gross pathology would suggest. 

The differential expression of the polynucleotides described herein can thus be used as, for 
example, diagnostic markers, prognostic markers, for risk assessment, patient treatment and the like. 
5 These polynucleotide sequences can also be used in combination with other known molecular and/or 
biochemical markers. The following examples provide relative expression levels of polynucleotides 
from specified cell lines and patient tissue samples. 

The differential expression data for polynucleotides of the invention that have been identified as 
being differentially expressed across various combinations of the libraries described above is 

10 summarized in Table 5 (inserted prior to the claims). Table 5 provides: 1) the Sequence Identification 
Number ("SEQ") assigned to the polynucleotide; 2) the cluster ("CLST") to which the polynucleotide 
has been assigned as described above; 3) the library comparisons that resulted in identification of the 
polynucleotide as being differentially expressed ("Library Pair A,B"), with shorthand names of the 
compared libraries provided in parentheses following the library numbers;, 4) the number of clones 

15 corresponding to the polynucleotide in the first library listed ("A"); 5) the number of clones 
corresponding to the polynucleotide in the second library listed ("B"); 6) the "A/B" where the 
comparison resulted in a finding that the number of clones in library A is greater than the number of 
clones in library B; and 7) the "B/A" where the comparison resulted in a finding that the number of 
clones in library B is greater than the number of clones in library A. 

20 

Example 6 : Source of Biological Materials for Microarrav-Based Experiments 

The biological materials used in the experiments described in the subsequent examples relating 
to microarry data are described below. 

Source of patient tissue samples 

25 Normal and cancerous tissues were collected from patients using laser capture microdissection 

(LCM) techniques, which techniques are well known in the art (see, e.g., Ohyama et al (2000) 
Biotechniques 29:530-6; Cxxnmetal. (2000) Mol Pathol 53:64-8; Suarez-Quian et al (1999) 
Biotechniques 26:328-35; Simone et al (1998) Trends Genet 14:272-6; Conia et al (1997) J. Clin. 
Lab. Anal. 11:28-38; Emmert-Buck al (1996) Science 274:998-1001). Table 9 (inserted following 

30 the last page of the Examples ) provides information about each patient from which the samples were 

isolated, including: the Patient ID and Path ReporHD, numbers assigned to the patient and the pathology 
reports for identification purposes; the anatomical location of the tumor (AnatomicalLoc); The Primary 
Tumor Size; the Primary Tumor Grade; the Histopathologic Grade; a description of local sites to which 
the tumor had invaded (Local Invasion); the presence of lymph node metastases (Lymph Node 
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Metastasis); incidence of lymph node metastases (provided as number of lymph nodes positive for 
metastasis over the number of lymph nodes examined) (Incidence Lymphnode Metastasis); the Regional 
Lymphnode Grade; the identification or detection of metastases to sites distant to the tumor and their 
location (Distant Met & Loc);a description of the distant metastases (Description Distant Met); the 
5 grade of distant metastasis (Distant Met Grade); and general comments about the patient or the tumor 
(Comments). Adenoma was not described in any of the patients; adenoma dysplasia (described as 
hyperplasia by the pathologist) was described in Patient ID No. 695. Extranodal extensions were 
described in two patients, Patient ID Nos. 784 and 791. Lymphovascular invasion was described in 
seven patients, Patient ID Nos. 128, 278, 517, 534, 784, 786, and 791.. Crohn's-like infiltrates were 
10 described in seven patients, Patient ID Nos. 52, 264, 268, 392, 393, 784, and 79 1 . 
Polynucleotides on arrays 

Polynucleotides spotted on the arrays were generated by PCR amplification of clones derived 
from cDNA libraries. The clones used for amplification were either the clones from which the sequences 
described herein (SEQ ID NOS: 1-2396) were derived, or are clones having inserts with significant 
15 polynucleotide sequence overlap wih the sequences described herein (SEQ ID NO:l-2396) as determined 
by BLAST2 homology searching. 

Example 7: Microarray Design 

Each array used in the examples below had an identical spatial layout and control spot set. Each 

20 microarray was divided into two areas, each area having an array with, on each half, twelve groupings of 
32 x 12 spots for a total of about 9,216 spots on each array. The two areas are spotted identically which 
provide for at least two duplicates of each clone per array. Spotting was accomplished using PCR 
amplified products from 0.5kb to 2.0 kb and spotted using a Molecular Dynamics Gen JQ spotter 
according to the manufacturer's recommendations. The first row of each of the 24 regions on the array 

25 had about 32 control spots, including 4 negative control spots and 8 test polynucleotides. 

The test polynucleotides were spiked into each sample before the labeling reaction with a range 
of concentrations from 2-600 pg/slide and ratios of 1:1 . For each array design, two slides were 
hybridized with the test samples reverse-labeled in the labeling reaction. This provided for about 4 
duplicate measurements for each clone, two of one color and two of the other, for each sample. 

30 

Example 8: Identification Of Differentially Expressed Genes 

cDNA probes were prepared from total RNA isolated from the patient cells described in 
Example 6. Since LCM provides for the isolation of specific cell types to provide a substantially 
homogenous cell sample, this provided for a similarly pure RNA sample. 
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Total RNA was first reverse transcribed into cDNA using a primer containing a T7 RNA 
polymerase promoter, followed by second strand DNA synthesis. cDNA was then transcribed in vitro 
to produce antisense RNA using the T7 promoter-mediated expression (see, e.g., Luo et al (1999) 
Nature Med 5:1 17-122), and the antisense RNA was then converted into cDNA. The second set of 
5 cDNAs were again transcribed in vitro, using the T7 promoter, to provide antisense RNA. Optionally, 
the RNA was again converted into cDNA, allowing for up to a third round of T7-mediated amplification 
to produce more antisense RNA. Thus the procedure provided for two or three rounds of in vitro 
transcription to produce the final RNA used for fluorescent labeling. Fluorescent probes were generated 
by first adding control RNA to the antisense RNA mix, and producing fluorescently labeled cDNA from 

10 the RNA starting material. Fluorescently labeled cDNAs prepared from the tumor RNA sample were 
compared to fluorescently labeled cDNAs prepared from normal cell RNA sample. For example, the 
cDNA probes from the normal cells were labeled with Cy3 fluorescent dye (green) and the cDNA probes 
prepared from the tumor cells were labeled with Cy5 fluorescent dye (red). 

The differential expression assay was performed by mixing equal amounts of probes from tumor 

15 cells and normal cells of the same patient The arrays were prehybridized by incubation for about 2 hrs 
at 60°C in 5X SSC/0.2% SDS/1 mM EDTA, and then washed three times in water and twice in 
isopropanol. Following prehybridization of the array, the probe mixture was then hybridized to the array 
under conditions of high stringency (overnight at 42°C in 50% formamide, 5X SSC, and 0.2% SDS. 
After hybridization, the array was washed at 55°C three times as follows: 1) first wash in IX SSC/0.2% 

20 SDS; 2) second wash in 0. IX SSC/0.2% SDS; and 3) third wash in 0. IX SSC. 

The arrays were then scanned for green and red fluorescence using a Molecular Dynamics 
Generation III dual color laser-scanner/detector. The images were processed using BioDiscovery 
Autogene software, and the data from each scan set normalized to provide for a ratio of expression 
relative to normal. Data from the microarray experiments was analyzed according to the algorithms 

25 described in U.S. application serial no. 60/252,358, filed November 20, 2000, by E.J. Moler, M.A. 
Boyle, and F.M. Randazzo, and entitled "Precision and accuracy in cDNA microarray data," which 
application is specifically incorporated herein by reference. 

The experiment was repeated, this time labeling the two probes with the opposite color in order 
to perform the assay in both "color directions." Each experiment was sometimes repeated with two more 

30 slides (one in each color direction). The level fluorescence for each sequence on the array expressed as a 
ratio of the geometric mean of 8 replicate spots/genes from the four arrays or 4 replicate spots/gene from 
2 arrays or some other permutation. The data were normalized using the spiked positive controls present 
in each duplicated area, and the precision of this normalization was included in the final determination of 
the significance of each differential. The fluorescent intensity of each spot was also compared to the 

58 



WO 01/66753 



PCI7US01/07787 



negative controls in each duplicated area to determine which spots have detected significant expression 
levels in each sample. 

A statistical analysis of the fluorescent intensities was applied to each set of duplicate spots to 
assess the precision and significance of each differential measurement, resulting in a p-value testing the 
5 null hypothesis that there is no differential in the expression level between the tumor and noimal 

samples of each patient. During initial analysis of the microarrays, the hypothesis was accepted if p>10" 
3 , and the differential ratio was set to 1.000 for those spots. All other spots have a significant difference 
in expression between the tumor and normal sample. If the tumor sample has detectable expression and 
the normal does not, the ratio is truncated at 1000 since the value for expression in the normal sample 

1 0 would be zero, and the ratio would not be a mathematically useful value (e.g., infinity). If the normal 
sample has detectable expression and the tumor does not, the ratio is truncated to 0.001, since the value 
for expression in the tumor sample would be zero and the ratio would not be a mathematically useful 
value. These latter two situations are referred to herein as "on/off." Database tables were populated 
using a 95% confidence level (p>0.05). 

15 Tables 10-14:clf summarizes the results of the differential expression analysis, where the 

difference in the expression level in the colon tumor cell relative to the matched normal colon cells is 
greater than or equal to 2 fold (">=2x"), 2.5 fold C>=2.5x n \ or 5 fold (">=5x") in at least 20% or more 
of the patients analyzed. Each table provides: theSEQEDNO; the percentage of patients tested having 
a colon tumor that exhibited at least 2 fold (">=2x"), 2.5 fold (">=2.5x"), or 5 fold (">=5x") increase in 

20 expression levels of the indicated gene relative to matched normal colon tissue; and the ratio data for 
each patient sample tested (columns headed by "PA," indicating the Patient Identification Number, e.g., 
"PIS" indicates the ration data for patient 15). 

In general, a polynucleotide is said to represent a significantly differentially expressed gene 
between two samples when there is detectable levels of expression in at least one sample and the ratio 

25 value is greater than at least about 1.2 fold, preferably greater than at least about 1.5 fold, more 
preferably greater than at least about 2 fold, where the ratio value is calculated using the method 
described above. 

A differential expression ratio of 1 indicates that the expression level of the gene in the tumor 
cell was not statistically different from expression of that gene in normal colon cells of the same patient. 
30 A differential expression ratio significantly greater than 1 in cancerous colon cells relative to normal 
colon cells indicates that the gene is increased in expression in cancerous cells relative to normal cells, 
indicating that the gene plays a role in the development of the cancerous phenotype, and may be involved 
in promoting metastasis of the cell. Detection of gene products from such genes can provide an indicator 
that the cell is cancerous, and may provide a therapeutic and/or diagnostic target. 
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Likewise, a differential expression ratio significantly less than 1 in cancerous colon cells relative 
to normal colon cells indicates that, for example, the gene is involved in suppression of the cancerous 
phenotype. Increasing activity of the gene product encoded by such a gene, or replacing such activity, 
can provide the basis for chemotherapy. Such gene can also serve as markers of cancerous cells, e.g., the 
5 absence or decreased presence of the gene product in a colon cell relative to a normal colon cell indicates 
that the cell may be cancerous. 

Example 9: Functional Analysis Of Gene Products Differentially Expressed In Cancer In Patients 

The gene products of genes differentially expressed in cancerous cells are further analyzed to 
10 confirm the role and function of the gene product in tumorgenesis, e.g., in promoting or inhibiting 

development of a metastatic phenotype. 

Blocking expression of gene products using antisense 

The effect of single genes upon development of cancer is assessed through use of antisense 
oligonucleotides specific for sequences corresponding to a selected sequence. Antisense oligonucleotides 
15 are prepared based upon a selected sequence that corresponds to a gene of interest. The antisense 

oligonucleotide is introduced into a test cell and the effect upon expression of the corresponding gene, as 
well as the effect upon a phenotype of interest assessed (e.g., a normal cell is examined for induction of 
the cancerous phenotype, or a cancerous cell is examined for suppression of a cancerous phenotype (e.g., 
suppression of metastasis)). 
20 Blocking function of gene products using gene product-specific antibodies and/or small 

molecule inhibitors 

The function of gene products corresponding to genes/clusters identified herein can be assessed 
by blocking function of the gene products in the cell. For example, where the gene product is secreted, 
blocking antibodies can generated and added to cells to examine the effect upon the cell phenotype in the 

25 context of, for example, the transformation of the cell to a cancerous, particularly a metastatic, 

phenotype. In order to generate antibodies, a clone corresponding to a selected gene product/cluster is 
selected, and a sequence that represents a partial or complete coding sequence is obtained The resulting 
clone is then expressed, the polypeptide produced isolated, and antibodies generated. The antibodies are 
then combined with cells and the effect upon tumorigenesis assessed. 

30 Where the gene product of the gene/clusters identified herein exhibits sequence homology to a 

protein of known function (e.g., to a specific kinase or protease) and/or to a protein family of known 
function (e.g. , contains a domain or other consensus sequence present in a protease family or in a kinase 
family), then the role of the gene product in tumorigenesis, as well as the activity of the gene product, can 
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be examined using small molecule that inhibit or enhance function of the corresponding protein or 
protein family. 

Those skilled in the art will recognize, or be able to ascertain, using not more than routine 
5 experimentation, many equivalents to the specific embodiments of the invention described herein. Such 
specific embodiments and equivalents are intended to be encompassed by the following claims. 

All publications and patent applications cited in this specification are herein incorporated by 
reference as if each individual publication or patent application were specifically and individually 
indicated to be incorporated by reference. The citation of any publication is for its disclosure prior to the 
10 filing date and should not be construed as an admission that the present invention is not entided to 
antedate such publication by virtue of prior invention. 

Although the foregoing invention has been described in some detail by way of illustration and 
example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art 
in light of the teachings of this invention that certain changes and modifications may be made thereto 
15 without departing from the spirit or scope of the appended claims. 

Deposit Information . The following materials were deposited with the American Type Culture 
Collection (CMCC = Chiron Master Culture Collection). 



Table 6. Cell Lines Deposited with ATCC 



Cell Line 


Deposit Date 


ATCC Accession No. 


CMCC Accession No. 


KM12L4 


March 19, 1998 


CRL- 12496 


11606 


Kml2C 


May 15, 1998 


CRL- 12533 


11611 


MDA-MB-231 


May 15, 1998 


CRL-12532 


10583 


MCF-7 


October 9, 1998 


CRL- 12584 


10377 



20 

In addition, pools of selected clones, as well as libraries containing specific clones, were 
assigned an "ES" number (internal reference) and deposited with the ATCC. Table 7 (inserted before 
the claims) provides the ATCC Accession Nos. and internal references (CMCC Nos.) of the ES deposits, 
all of which were deposited on or before the filing date of the present application The names of the 
25 clones contained within each of these deposits are provided in Table 8 (inserted before the claims). 

The above material has been deposited with the American Type Culture Collection, 
Rockville, Maryland, under the accession number indicated. These deposits will be maintained under the 
terms of the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for the 
Purposes of Patent Procedure. The deposit will be maintained for a period of at least 30 years following 
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issuance of this patent, or for the enforceable life of the patent, whichever is greater. Upon the granting 
of a patent, all restrictions on the availability to the public of the deposited material will be irrevocably 
removed. 

The deposits described herein are provided merely as convenience to those of skill in the art, and 
is not an admission that a deposit is required under 35 U.S.C. §1 12. The sequence of the 
polynucleotides contained within the deposited material, as well as the amino acid sequence of the 
polypeptides encoded thereby, are incorporated herein by reference and are controlling in the event of 
any conflict with the written description of sequences herein. A license may be required to make, use, or 
sell the deposited material, and no such license is granted hereby. 

Retrieval of Individual Clones from Deposit of Pooled Clones . Where the ATCC deposit is 
composed of a pool of cDNA clones or a library of cDNA clones, the deposit was prepared by first 
transfecting each of the clones into separate bacterial cells. The clones in the pool or library were then 
deposited as a pool of equal mixtures in the composite deposit Particular clones can be obtained from 
the composite deposit using methods well known in the art For example, a bacterial cell containing a 
particular clone can be identified by isolating single colonies, and identifying colonies containing the 
specific clone through standard colony hybridization techniques, using an oligonucleotide probe or 
probes designed to specifically hybridize to a sequence of the clone insert (e.g., a probe based upon 
unmasked sequence of the encoded polynucleotide having the indicated SEQ ID NO). The probe should 
be designed to have a T m of approximately 80°C (assuming 2°C for each A or T and 4°C for each G or 
C). Positive colonies can then be picked, grown in culture, and the recombinant clone isolated. 
Alternatively, probes designed in this manner can be used to PCR to isolate a nucleic acid molecule from 
the pooled clones according to methods well known in the art, e.g., by purifying the cDNA from the 
deposited culture pool, and using the probes in PCR reactions to produce an amplified product having 
the corresponding desired polynucleotide sequence. 
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Table 1 



SEQ 
ID 


CLUSTE 
R 


SEQ NAME 


0 


CLONE ID 


LIBRARY i 


1 


734646 


RTA222000 1 OF.k. 10. 1 .P.Seq 


F 


M0005648L62 


CH16C0P 


2 


400221 


RTA22200001F.a. 17. 1. P.Seq 


F 


M00042528:611 


CH15C0N 


3 


205329 


RTA22200006F.d.09.2.P.Seq 


F 


M00056020:410 


CH15C0N 


4 


446680 


RTA2220000 lF.f.07. 1 .P.Seq 


F 


M00042693:54 


CH15C0N 


5 


1261 


RTA2220002 lF.j. 1 8.3.P.Seq 


F 


M00054812:15 


CH17C0HLV 


6 


400258 


RTA2220001 lF.k.23.1.P.Seq 


F 


M00056617:86 


CH16C0P 


7 


450559 


RTA22200005F.e.2 1 . 1 .P. Seq 


F 


M00055882:16 


CH15C0N 


8 


450959 


. RTA22200012F.e. 11.1. P.Seq 


F 


M00056703:46 


CH16C0P 


9 


451794 


RTA22200007F.1. 16. l.P.Seq 


F 


M00056247:76 


CH15C0N 


10 


415058 


RTA22200020F.d.l l.l.P.Seq 


F 


M0005459 L87 


CH17C0HLV 


11 


31506 


RTA22200012F.b.08. l.P.Seq 


F 


M00056670:lll 


CH16C0P 


12 


417155 


RTA22200002F.f. 1 0. 1 .P.Seq 


F 


M00055466.-28 


CH15C0N 


13 


448925 


RTA222000 1 9F.e.2 1 . 1 .P. Seq 


F 


M00043507:45 


CH17C0HLV 


14 


11329 I 


RTA22200006F.d.l0.2.P.Seq . 


F 


M00056020:47 


CH15C0N 


15 


650422 


RTA2220000 lF.n. 14. l.P.Seq 


F 


M0004291L83 


CH15C0N 


16 


6863 


RTA22200229F.f . 1 3 . 1 .P. Seq 


F 


M00006967:25 


CH02COH 


17 


449690 


RTA22200002F.g.l8.1.P.Seq 


F 


M00055495:45 


CH15C0N 


18 


724616 


RTA222000 1 6F.j .23 . 1 .P. Seq 


F 


M00057236:86 


CH16C0P 


19 


549722 


RTA22200025F.m.01.2.P.Seq 


F 


M00055383:24 


CH17C0HLV 


20 


549722 


RTA22200025F.1.24. 1 .P.Seq 


F 


M00055383:24 


CH17C0HLV 


21 


448110 


RTA222000 1 8F.m.04. 1 .P. Seq 


F 


M00043354:31 


CH17C0HLV 


22 


515631 


RTA22200010F.j.l4.1.P.Seq 


F 


M00056434:38 


CH16C0P 


23 


11881 


RTA22200233F.k.04. l.P.Seq 


F 


M00008099:78 


CH03MAH 


24 


650856 


RTA22200012F.n.24.1.P.Seq 


F 


M00056772:14 


CH16C0P 


25 


449701 


RTA222000 12FX2 1 . 1 .P.Seq 


F 


M00056710:89 


CH16C0P 


26 


651073 


RTA22200007F.1.06. l.P.Seq 


F 


M00056243:710 


CH15C0N 


27 


10340 


RTA22200234F.b.07.1.P.Seq 


F 


M000221 89:23 


CH03MAH 


28 


648310 


RTA22200007F.m.04. 1 .P.Seq 


F 


M00056252:88 


CH15C0N 


29 


730336 


RTA22200013F.1.02.1.P.Seq 


F 


M00056879:811 


CH16C0P 


30 


3060 


RTA222000 1 8F.b. 1 0. LP.Seq 


F 


M00042444:88 


CH17C0HLV 


31 


453016 


RTA222000 10F.1.06.1 .P.Seq 


F 


M00056485:212 


CH16C0P 


32 


508931 


RTA22200024F.L 13.1.P.Seq 


F 


M00055209:410 


CH17C0HLV 


33 


185461 


RTA22200242F.b.06. LP.Seq 


F 


M00026975:23 


CH04MAL 


34 


452530 


RTA22200015F.n. 11. LP.Seq 


F 


M0005713L21 


CH16C0P 


35 


448925 


RTA22200026F.d.02. LP.Seq 


F 


M00055419:71 


CH17C0HLV 


36 


1013 


RTA22200005F.m.06. LP. Seq 


F 


M00055945:811 


CH15C0N 


37 


6545 


RTA2220024 lF.d.23. LP.Seq 


F 


M00026879:410 


CH04MAL 


38 


449891 


RTA22200001F.b.23.1.P.Seq 


F 


M00042540:85 


CH15C0N 


39 


4045 


RTA22200227F.n.06.1.P.Seq 


F 


M00006740:71 


CH02COH 


40 


404475 


RTA22200002F.b.23. l.P.Seq 


F 


M00055438:810 


CH15C0N 


41 


650297 


RTA2220000 lF.n. 10. LP. Seq 


F 


M00042909.74 


CH15C0N 


42 


650493 


RTA22200005F.n.03. l.P.Seq 


F 


M00055959:112 


CH15C0N 


43 


644884 


RTA22200007F.k.04. l.P.Seq 


F 


M00056232:712 


CH15C0N 


44 


452212 


RTA22200021F.k.2 1.3.P.Seq 


F 


M0005482L311 


CH17C0HLV 


45 


402727 


RTA222000 1 OF.n.09. LP. Seq 


F 


M00056505:82 


CH16C0P 
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Table 1 



SEQ 
ID 


CLUSTE 
R 


SEQ NAME 


0 


CLONE ID 


LIBRARY 


136 


554079 


RTA22200012F.j. 17. 1 .P.Seq 


F 


M00056735:28 


CH16COP 


137 


9049 


RTA22200222FL05. l.P.Seq 


F 


M00003948.-212 


CH01COH 


138 


1307 


RTA22200244F.n. 16. 1 P.Seq 


F 


M00027222-.39 


CH04MAL 


139 


139730 


RTA22200242F.g.ll. l.P.Seq J 


F 


M00027016:76 


CH04MAL i 


140 


7750 


RTA2220024 lF.f.24. 1 .P.Seq 


F 


M00026899:711 


CH04MAL 


141 


8050 


RTA22200227F.p.20. l.P.Seq 


F 


M0000676L49 


CH02COH 


142 


725222 


RTA22200013F.k.24.1.P.Seq 


F 


M00056879:55 


CH16COP 


143 


3275 


RTA22200235F.j.22.2.P.Seq 


F 


M000225 16:59 


CH03MAH 


144 


7424 


RTA22200235F.C.12. l.P.Seq 


F 


M00022430:44 


CH03MAH 


145 


8953 


RTA2220024 lF.c.05 . 1 .P.Seq 


F 


M00026866:88 


CH04MAL 


146 


8966 


RTA22200243F.C.04. 1 .P.Seq 


F 


M00027088:86 


CH04MAL 


147 


530883 


RTA22200013F.i.22.1.P.Seq 


F 


M00056866.55 


CH16COP 


148 


6725 


RTA22200238F.1.09. 1 .P.Seq 


F 


M00022973.78 


CH03MAH 


149 


4439 


RTA22200222F.m.24. 1 .P.Seq 


F 


M00004167:411 


CH01COH 


150 


648472 


RTA22200012F.g.07. 1 .P.Seq 


F 


M00056712:17 


CH16COP 


151 


735346 


RTA2220001 1F.1.03. l.P.Seq 


F 


M000566 18:22 


CH16COP 


152 


732121 


RTA2220001 lF.j.05. l.P.Seq 


F 


M00056600:87 


CH16COP 


153 


650337 


RTA22200005F.h. 15 . 1 .P.Seq 


F 


M00055900:25 


CH15CON 


154 


533588 


RTA22200005F.p.05. 1 .P.Seq 


F 


M0005598L17 


CH15CON 


155. 


649667 


RTA22200007F.p.20. 1 .P.Seq 


F 


M00056290:82 


CH15CON 


156 


394436 


RTA22200015F.p.07. 1 .P.Seq 


F 


M00057145:45 


CH16COP 


157 


649354 


RTA22200007F.h. 13. 1 .P.Seq 


F 


M00056210 53 


CH15CON 


158 


2022 


RTA22200240F.e.lO. 1 .P.Seq 


F 


M00023347:312 


CH04MAL 


159 


561359 


RTA22200003F.m.08.1.P.Seq 


F 


M00055703:26 


CH15CON 


160 


7607 


RTA22200225F.m.20. 1 .P.Seq 


F 


M00005520:512 


CH02COH 


161 


7750 


RTA22200242F.f.06. l.P.Seq 


F 


M00027006:81 


CH04MAL 


162 


410554 


RTA22200012F.i.21. LP.Seq 


F 


M00056729:812 


CH16COP 


163 


2315 


RTA22200230F.e.24. LP.Seq 


F 


M00007135:211 


CH02COH 


164 


561734 


RTA22200014F.f.l0.2.P.Seq 


F 


M00056961:712 


CH16COP 


165 


4420 


RTA22200229F.f.02. LP.Seq 


F 


M00006964:lll 


CH02COH 


166 


559663 


RTA22200012F.d.l7.1.P.Seq 


F 


M00056697:53 


CH16COP 


167 


7082 


RTA22200235F.p.01 .2.P.Seq 


F 


M00022565.15 


CH03MAH 


168 


2315 


RTA22200230F.f.01 . 1 .P.Seq 


F 


M00007135:211 


CH02COH 


169 


650472 


RTA22200012F.j.21.1.P.Seq 


F 


M00056737:77 


CH16COP 


170 


6482 


RTA22200230F.a.l0.1.P.Seq 


F 


M00007096:51 


CH02COH 


171 


4584 


RTA22200230F.a.09. l.P.Seq 


F 


M00007096:52 


CH02COH 


172 


453846 


RTA22200012F.b. 14. l.P.Seq 


F 


M00056674:55 


CH16COP 


173 


650820 


RTA2220001 IF p 05 1 P Sea 


F 


M00056656-83 


CH16COP 


174 


642906 


RTA22200005F.g.06. 1 .P.Seq 


F 


M00055887.36 


CH15CON 


175 


448805 


RTA22200005F.i.23. LP.Seq 


F 


M000559 12:35 


CH15CON 


176 


649667 


RTA22200006F.k.l8.2.P.Seq 


F 


M00056082:66 


CH15CON 


177 


735786 


RTA22200012F.m. 12. l.P.Seq 


F 


M00056758:35 


CH16COP 


178 


121457 


RTA22200012F.p.l8.1.P.Seq 


F 


M00056785:68 


CH16COP 


179 


372960 


RTA22200012F.m.06. l.P.Seq 


F 


M00056756:28 


CH16COP 


180 


120049 


RTA22200012F.j.l0.1.P.Seq 


F 


M00056733:49 


CH16COP 
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Table 2A Nearest Neighbor (BlastN vs. GenBank) 



SEQID 


ACCESSION 


DESCRIPTION 


P VALUE 


148 


U05237 


Human fetal Alz-50 -reactive clone 1 (FAC1) mRNA, 
complete cds. 


3.8 


149 


U92856 


Comptonia peregrina maturase (matK) gene, 
chloroplast gene encoding chloroplast protein, complete 
cds 


3.8 


150 


X94165 


Human papillomavirus type 73 E6, E7, El, E2, E4, L2, 
and LI genes 


3.7 


151 


U47875 


Drosophila azteca NDSSC 14012-0171,6 
glycerophosphate dehydrogenase (Gpd) gene, partial 
cds 


3.7 


152 


X02882 


Human HLA class II alpha chain gene DZ-alpha 


3.7 


153 


AF005932 


Clavispora opuntiae Spt3 (SPT3) gene, complete cds 


3.7 


154 


711 OAf\ 

Z1184U 


D.melanogaster hedgehog gene DNA 


3.7 


155 


U06745 


Arabidopsis thaliana ecotype Landsberg K+ transport 
system AKT1 gene, complete cds. 


3.7 


156 


U63362 


Unidentified crenarchaeote 16S ribosomal RNA gene, 
5' partial sequence 


3.7 


157 


D30810 


Wheat gene for transcription factor HBP-lb(c38), final 
exon, partial cds 


3.7 


158 


X56089 


X. laevis mRNA for alpha-subunit of G-protein, type G 
alpha-i-1 


3.7 


159 


X07701 


Chironomus tentans Balbam nng mRNA BR 2. 1 3- 
end 


3.7 


160 


X64649 


G.gallus mRNA for restnctm 


3.7 


161 


Y13426 


Homo sapiens TCRDV2 gene, partial 


3.7 


162 


Y14443 


Homo sapiens mRNA for zinc finger protein 


3.7 


163 


U92794 


"WW • 111 * 1 TT * 1 • . 

Mus museums alpha glucosidase II beta subunit 
mRNA, complete cds 


3.7 ; 


164 


Y09480 


A.europaeus genes encoding dehydrogenase and 
cytochrome c 


3.7 


165 


NM_00 1659.1 


Homo sapiens ADP-ribosylation factor 3 (ARF3) 
mRNA > :: gb|M74491)HUMADPRF3A Human ADP- 
nbosylation factor 3 mRNA, complete cds. 


3.7 


166 


L20893 


Rice yellow mottle virus complete genome. 


3.7 


167 


AFO 19759 


Cams famihans beta-glucuronidase (GUSB) mRNA, 
complete cds 


3.7 


168 


U62587 


Cricetulus griseus beta- 1,6 -N- 
acetylglucosaminyltransferase Lec4A cell line point 
mutant mRNA, complete cds 


3.7 


169 


D50085 


Cucumis sativus mRNA for NADPH- 
protochlorophyllide oxidoreductase, complete cds 


3.7 


170 


M81890 


Human interleukin 1 1 (IL1 1) gene, complete mRNA. 


3.7 
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Table 2B Nearest Neigl 


lbor (BlastX vs. Non-Redundant Proteins) 


SEQID 


ACCESSION 


DESCRIPTION 1 


P 

VALUE 


159 


4454483 


(AC006234) putative kinase, 5' partial 


9.1 


160 


3044086 


(AF055904) unknown [Myxococcus xanthus] 


5.4 


161 


1945493 


(U56965) Similar to NAD(P) transhydrogenase, mitochondrial; 
coded for by C. elegans cDNA yk27cL5; coded for by C. 
elegans cDNA yk35b9.5; coded for by C. elegans cDNA 
yk35b9.3; coded for by C. elegans cDNA ykl61c9.3; coded for 
by C. elegans ... 


5.4 


162 


730883 


SYNAPTIC VESICLE PROTEIN 2 (SV2) norvegicus] 


5.4 


163 


2351212 


(D88386) gag-pol polyprotein (precursor protein) [Friend 
murine leukemia virus] 


4.2 


! 164 


1722738 


MINOR CAPSID PROTEIN L2 >gi| 1020224 type 36] 


3.2 


165 


4115922 


(AF1 18222) contains similarity to ubiquitin carboxyl-terminal 
hydrolase family 2 (Pfam;PF00443, score=48.3, E=3.5e-13, 
N=2) and (Pfam:PF00442, Score=40.0 E=5.2e-08, N=l) 
[Arabidopsis thaliana] 


2.4 


166 


1351639 


VERY HYPOTHETICAL 52.7 KD PROTEIN C8A4.05C IN 
CHROMOSOME I >gi|2130446|pir1|S62521 hypothetical 
protein SPAC8A4.05c - fission yeast 


2.4 


167 


4506857 


small inducible cytokine subfamily D (Cys-X3-Cys), member 1 
(fractalkine, neuiotactin) >gi|1888523 (U84487) CX3C 
chemokine precursor [Homo sapiens] >gi|1899259 (U91835) 
CX3C chemokine precursor [Homo sapiens] >gi|3252821 


2.4 


| 168 


4505637 


protocadherin 8; PCDH8 sapiens] 


1.9 


169 


3249055 


(AF071210) casein kinase II alpha subunit [Spodoptera 
frugiperda] 


1.4 


170 


854065 


(X83413) U88 [Human herpesvirus 6] 


le-005 


! 171 


854065 


(X83413) U88 [Human herpesvirus 6] 


le-005 


191 


3150072 


(AF046996) preSl surface protein [woolly monkey hepatitis B 
Virus] 


6.9 


192 


2622845 


(AE000928) corrinoid/iron-sulfiir protein, large subunit 


6.9 


193 


1083554 


tyrosine phosphoprotein SLP-76 - mouse 


6.9 


194 


1708868 


LOW-DENSITY LIPOPROTEIN RECEPTOR-RELATED 
PROTEIN PRECURSOR (LRP) Caenorhabditis elegans 
>gi|156360 (M96150) LDL receptor-related protein 
[Caenorhabditis elegans] Genefinder; Identity to C. elegans Low 
density lipid (LDL) receptor-related protein (TR:G15636 


5.2 


195 


3169030 


(AL023702) putative insertion element IS 1647 transposase 
[Streptomyces coelicolor] 


4 


196 


728835 


!!!! ALU SUBFAMILY SC WARNING ENTRY 


4 


197 


484695 


vascular cell adhesion molecule 1 - human 


3.9 
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TabieS 



SEQ 


CLST 


Library Pair A3 


A 


B 


A/B 


B/A 






03,04 (Breast, High Met vs. Breast, Non-Met) 


7 


118 




17.28 






01,02 (Colon, High Met vs. Colon, Low Met) 


25 


4 


5.76 




164 


561734 






























15,16 (Normal Colon vs. Colon Tumor Tissue) 


0 


6 




5.68 


165 


4420 






























03,04 (Breast, High Met vs. Breast, Non-Met) 


6 


0 


5.85 








01,02 (Colon, High Met vs. Colon, Low Met) 


1 


10 




10.84 


166 


559663 






























15,16 (Normal Colon vs. Colon Tumor Tissue) 


15 


4 


3.96 




167 


7082 






























03,04 (Breast, High Met vs. Breast, Non-Met) 


10 


0 


9.76 




168 


2315 






























03,04 (Breast, High Met vs. Breast, Non-Met) 


7 


118 




17.28 






01,02 (Colon, High Met vs. Colon, Low Met) 


25 


4 


5.76 




169 


650472 






























15,17 (Normal Colon Tissue vs. Colon 
Metastasis) 


6 


0 


6.44 




170 


6482 






























01,02 (Colon, High Met vs. Colon, Low Met) 


0 


6 




6.5 


171 


4584 






























01,02 (Colon, High Met vs. Colon, Low Met) 


1 


11 




11.93 


172 


453846 






























15,17 (Normal Colon Tissue vs. Colon 
Metastasis) 


0 


11 




10.25 


173 


650820 






























16, 17 (Colon Tumor Tissue vs. Colon 
Metastasis) 


8 


0 


8.12 




174 


642906 
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We Claim: 

1. An isolated polynucleotide comprising a nucleotide sequence which hybridizes under 
stringent conditions to a sequence selected from the group consisting of SEQ ID NOS: 1-2396. 

5 

2. An isolated polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide 
sequence having at least 90% sequence identity to a sequence selected from the group consisting of: 
SEQ ID NOS:l-2396, a degenerate variant of SEQ ID NOS:l-2396, an antisense of SEQ ID NOS:l- 
2396, and a complement of SEQ ID NOS: 1-2396. 

10 

3. A polynucleotide comprising a nucleotide sequence of an insert contained in a clone 
deposited as clone number xx of ATCC Deposit Number xx . 

4. An isolated cDNA obtained by the process of amplification using a polynucleotide 

15 comprising at least 15 contiguous nucleotides of a nucleotide sequence of a sequence selected from 
the group consisting of SEQ ID NOS: 1-2396. 

5. The isolated cDNA of claim 4, wherein amplification is by polymerase chain reaction 
(PCR) amplification. 

20 

6. An isolated recombinant host cell containing the polynucleotide according to claims 1, 2, 

3, or 4. 

7. An isolated vector comprising the polynucleotide according to claims 1, 2, 3, or 4. 

25 

8. A method for producing a polypeptide, the method comprising the steps of: 
culturing a recombinant host cell containing the polynucleotide according to claims 1, 2, 3, 

or 4, said culturing being under conditions suitable for the expression of an encoded polypeptide; 
recovering the polypeptide from the host cell culture. 

30 

9. An isolated polypeptide encoded by the polynucleotide according to claims 1, 2, 3, or 4. 

10. An antibody that specifically binds the polypeptide of claim 9. 

35 1 1 . A method of detecting differentially expressed genes correlated with a cancerous state of 

a mammalian cell, the method comprising the step of: 
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detecting at least one differentially expressed gene product in a test sample derived from a 
cell suspected of being cancerous, where the gene product is encoded by a gene comprising an 
identifying sequence of at least one of SEQ ID NOS: 1-2396; 

wherein detection of the differentially expressed gene product is correlated with a cancerous 
state of the cell from which the test sample was derived. 

12. A library of polynucleotides, wherein at least one of the polynucleotides comprises the 
sequence information of the polynucleotide according to claims 1, 2, 3, or 4 

13. The library of claim 12, wherein the library is provided on a nucleic acid array. 

14. The library of claim 12, wherein the library is provided in a computer-readable format. 

15. A method of inhibiting tumor growth by modulating expression of a gene product, the 
gene product being encoded by a gene identified by a sequence selected from the group consisting of 
SEQ IDNOS:l-2396. 
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SEQUENCE LISTING 

<110> Williams, Lewis T. 
Escobedo, Jaime 
Innis, Michael A. 
Garcia, Pablo Dominiguez 
Sudduth-Klinger, Julie 
Reinhard, Christoph 
Giese, Klaus 
Randazzo, Filippo 
Kennedy, Giulia C, 
Pot , David 
Kassam, Altaf 
Lamson, George 
Drmanac, Radoje 
Crkven j akov , Radomir 
Dickson, Mark 
Drmanac , Sne zana 
Labat , Ivan 
Leshkowitz, Dena 
Kit a, David 
Garcia, Veronica 
Jones, Lee William 
Stache-Crain, Birgit 

<120> Human Genes and Gene Products 



<130> 1624.002 

<150> 60/188,609 
<151> 2000-03-09 

<160> 2396 

<170> FastSEQ for Windows Version 4.0 

<210> 1 

<211> 362 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc__feature 
<222> (1) . . . (362) 
<223> n <= A,T,C or G 

<400> 1 

atggggacgg aggaggaaag aaagacagcc actgggttcc ccgtgttcac acacttggtc 60 

tctttctctc tttctgctga ctggtctgag ggttagcatt tgtatccgca aaatggcttt 120 

tgagtcctca cagacagtgg ctttgagaaa cctgctcttg gtgtccccac atgacctcat 18 0 

tggtcaacct tagtgctgct aacagcaaga caagcagata ctgtgtgcat tccgacatga 240 

ggcagtacaa agtacatagt atcacctagg aactagtctt gccaaaagca gaggggggca 300 

gggggagaca gagagacaca nagagagaaa cagagaccgt gacagtgaga aatttaacct 3 SO 

an 362 
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accaccacct 
gattgtccca 
ctgtccttac 
aaattcaatc 
atctattatc 
cctgttgggg 



gggaccaacc 
ggacctgaag 
tgctgaacat 
tcaatggcca 
actgaaactt 
atccttgaac 



ttcagctctg 
ggagcaagga 
cctgcttgta 
ttgtccacat 
agtagcctgc 
ctggtttgag 



gaaccttcat 
tggcctcagg 
tcaggaaact 
aactgatcac 
tttttttttt 
ttttccc 



aaagcaggtc 
gcctggagaa 
cagaagcagt 
ccatggctgc 
taaagctatg 



agcgtggcct 
gtctgctact 
ttgccttgtc 
ctctcctatt 
gcgaatcttc 



120 
180 
240 
300 
360 
397 



<210> 166 
<211> 314 
<212> DNA 
<213> Homo sapiens 



<400> 166 

tccagtttaa 

ggaggccgag 

ttgctgaacc 

agtgtgcaga 

ccctcgctgc 

ccttaaaaat 



aggaacatgg 
gagggaggac 
tgatctagaa 
cttacccctt 
cttttctgcg 
taaa 



gccgggcgcg 
cacctgactt 
aatggtggta 
acggactctt 
tattataccc 



gtggctcatg 
tagaagatga 
ccacagcctc 
tatgagtttg 
cccaacatct 



cctgtaatct 
agaacaacct 
cggcttagaa 
ttcccccctt 
tgggtgggtc 



cagcactttg 
gtgcatcatg 
catgaaaaga 
tggagacttc 
ccctcgctga 



. 60 
120 
180 
240 
300 
314 



<210> 167 
<211> 396 
<212> DNA 
<213> Homo sapiens 



<400> 167 

cggcggagct 

agacacggcg 

aagaggaacc 

tcgaagtcgt 

ccataactag 

gttgggggta 

agacgggacc 



gtgagccggc 
ggtaggtcca 
agcaggcttc 
cgtccctctc 
ggaggaagga 
tccgagtccc 
aggagaggga . 



gactcgggtc 
caggcagatc 
cggagggttg 
atgcggtgcc 
gggccgagga 
agaagcacct 
egg cat gage 



cctgaggtct 
caactgggag 
tgtggtcagt 
acgcccatgg 
gtggaggggc 
ggaaccccga 
ggtatg 



ggattctttc 
ttgaagtgtg 
gactcagagt 
accttcttgt 
teaggegaag 
cagaagattc 



tccgctactg 
agtgagagtg 
gagaaggece 
ctcgtcacgg 
ctggggtgct 
tggactcccc 



60 
120 
180 
240 
300 
360 
396 



<210> 168 
<211> 397 
-<212> DNA 
<213> Homo sapiens 



<400> 168 

cgttgctgtc 

ggtttcgcta 

cggccttcca 

aatttaaaac 

agaegggect 

acccagcgag 

aggcacattg 



gggcacgtgc 
tgttggtcag 
aagtgctggg 
tatatatggg 
gggccagaag 
cctctgaagg 
tcgttctcaa 



caccacgccc 
gctggttttg 
attacaggcg 
tgtcttaggc 
tgggccatgg 
tgcaccgcca 
tataattgea 



ggccaatttt 
aactcctgat 
tgagccaccg 
ggcatcggtc 
agaccteggg 
cccccactgt 
cacagtt 



tgtattctta 
ttccggtgat 
cgcctggccg 
ccaactctaa 
acccgcaggg 
ttatcttact 



gtggagacgg 
ccaccaccct 
gaaatcatgt 
agtacgcgtt 
ctgccgcccg 
gectcatagt 



60 
120 
180 
240 
300 
360 
397 



<210> 169 
<211> 183 
<212> DNA 
<213> Homo sapiens 



<400> 169 

ctggtacggg teggataate ttcgtaatgg tgccggtgtg cctcgcttat taagttgatc 60 

gcttgtggaa ctatttcctt gggagcgtgt gcgaatcccc tgcgtttttt ttttgaatga 120 

cgtccatttt ttttcgtgaa tgaagtgtcg ttcttctttt tcgttgtgct gtttctcatg 180 
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DESC Novel human diagnostic and therapeutic gene #167. 

KW Human; cancer; breast; lung; colon; prostate; cytostatic; diagnostic; ss . 
ORGN Homo sapiens. 

AB The invention relates to new polynucleotides and polypeptides, useful for 

diagnosis and treatment of breast, lung and colon cancer. The sequences can 
be used in detecting differentially expressed genes correlated with a 
cancerous state of a mammalian cell, comprising detecting at least one 
differentially expressed gene product in a test sample derived from a cell 
suspected of being cancerous. They can also be used to inhibit tumour growth 
by modulating expression of a gene product. AAS36943- AAS39338 represent 
novel human diagnostic and therapeutic coding sequences of the invention. 

NA 82 A; 97 C; 146 G; 71 T; 0 other 

SQL 396 

SEQ 

1 cggcggagct gtgagccggc gactcgggtc cctgaggtct ggattctttc 
51 tccgctactg agacacggcg ggtaggtcca caggcagatc caactgggag 
101 ttgaagtgtg agtgagagtg aagaggaacc agcaggcttc cggagggttg 
151 tgtggtcagt gactcagagt gagaaggccc tcgaagtcgt cgtccctctc 
2 01 atgcggtgcc acgcccatgg accttcttgt ctcgtcacgg ccataactag 
251 ggaggaagga gggccgagga gtggaggggc tcaggcgaag ctggggtgct 
301 gttgggggta tccgagtccc agaagcacct ggaaccccga cagaagattc 
351 tggactcccc agacgggacc aggagaggga cggcatgagc ggtatg 



